1*1 



Office de la Propriete 

Intellectuelle 

du Canada 

Un organisme 
d'Industrie Canada 



Canadian 

Inteflectual Property 
Office 

An agency of 
Industry Canada 



CA 2563478 A1 2005/10727 

(21) 2 563 478 

DEMANDE DE BREVET CANADIEN 
CANADIAN PATENT APPLICATION 

(13) A1 



(86) Date de depot PCT/PCT Filing Date: 2005/04/18 

(87) Date publication PCT/PCT Publication Date: 2005/10/27 

(85) Entr^ phase nationale/National Entry: 2006/10/16 

(86) demande PCT/PCT Application No.: US 2005/013132 

(87) N° publication PCT/PCT Publication No.: 2005/099423 
(30) Priorite/Priority: 2004/04/16 (US60/563.091) 



(51) CMnt/lntCI. H04N 7^6 (20mmi 
H04N 5/262(2006.01) 

(71) Demandeurs/Applicants: 
AMAN, J/VMESA.. US; 
BENNETT, PAUL MICHAEL, US 

(72) Inventeurs/lnventors: 
/KMAN. JAMES A., US; 
BENNETT, PAUL MICHAEL. US 

(74) Agent: BLAKE, CASSELS & GRAYDON LLP 



(54)Titre : SYSTEME AUTOMATIQUE PERMETTANT DE FILMER EN VIDEO. DE SUIVRE UN EVENEMENT ET DE 

GENERER UN CONTENU 
(54) Title: AUTOMATIC EVENT VIDEOING. TRACKING AND CONTENT GENERATION SYSTEM 





s 

m. 



V 



l A V A Jfcr^'^^ ft \ 



//// 



(57)Abreg§//U)stract 

An automatic system 100 that uses one to three grids 20cm of overhead cameras 20c to first video an event area 2. Overall 
bandwidth is greatly reduced by intelligent hubs 26 that extract foreground blocks 1 0m based upon initial and continuously updated 



Q^H^^^J^f i>/$7:/%wa^,ca • Ottawa-Hull Kl A 0C9 • bttp://cjpo.gc. 



^ OPIC 



CIPO 



OPIC-OPO 191 



CA 2563478 A1 2005/10727 

(21) 2 563 478 

(13) A1 



(57) Abr6g§(suite)/Abstract(contjnued): 

background images 2r. The hubs also analyze curent images 10c to constantly locate, classify and track in 3D the limited number 
of expected foreground objects 10. As objects 10 of interest are tracked, the system automatically directs ptz perspective view 
cameras 40c to follow the activities. These asynchronous cameras 40c limit their images to defined repeatable pt angles and zoom 
depths. Pre-captured venue backgrounds 2r at each repeatable ptz setting fadiitate perspective foreground extraction. The moving 
background, such as spectators 13, is removed with various techniques including stereoscopic side cameras 40c-b and 40c-c 
flanking each perspective camera 40c. The tracking data 101 derived from the overhead view 102 establishes event performance 
measurement and analysis data 701. The analysis results in statistics and descriptive performance tokens 702 translatable via 
speech synthesis into audible descriptions of the event activities corresponding to overhead 102 and perspective video 202. 



CA 02563478 2006-10-16 



(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(19) Worid InteDectoal Property 
Organization 
International Bureau 

(43) International Publication Date 
27 October 2005 (27.10.2005) 




(10) International Publication Number 

PCX wo 2005/099423 A2 



(51) International Patent Classification: Not classified 

(21) International Application Number: 

PCT/US2005/013132 

(22) International Filing Date: 18 April 2005 (18.04.2005) 

(25) Filing Language: English 

(26) Publication Language: English 

(30) Priority Data: 

60/563,091 16 April 2004 (16.04.2004) US 

(63) Related by continuation (CON) or continuation-in-part 
(CIP) to earUer appUcation: 

US 10/006,444 (CIP) 

Filed on 20 November 2001 (20.11.2001) 



(81) Designated States (unless otherwise indicated, for every 
kind of national protection avaUabie)i AE, AG, AL, AM, 
AT, AU, AZ, BA, BB. BG, BR. BW, BY, BZ, CA, CH, CN, 
CO, CR, CU, CZ, DE, DK, DM. DZ, EC, EE, EG. ES, FI. 
GB. GD. GE, GH. GM, HR. HU, ID, IL, IN, IS. JP. KB. 
KG, KM, KP. KR. KZ, LC, LK, LR, LS, LT, LU, LV, MA. 
MD, MG, MK, MN, MW, MX. MZ. NA, NI, NO, NZ. OM, 
PG, PH, PL, PT, RO. RU, SC, SD. SE, SG. SK, SL, SM. SY. 
TJ, TM, TN, TR, TT, TZ, UA, UG, US (patent). UZ. VC, 
VN. YU, ZA, ZM, ZW. 

(84) Designated States (unless otherwise indicated, for every 
kind of regional protection available): ARIPO (BW, GH, 
GM, KE, LS. MW. MZ, NA, SD, SL, SZ. TZ, UG, ZM, 
ZW), Eurasian (AM, AZ. BY, KG, KZ, MD, RU, TJ, TM). 
European (AT. BE, BG. CH. CY. CZ, DE, DK. EE. ES, H. 
FR, GB , GR, HU, IE, IS , rr. LT. LU, MC. NL. PL, PT, RO, 
SB, SI, SK, TR), OAPl (BF, BJ. CF, CG, O. CM, GA, GN, 
GQ, GW, ML, MR. NE, SN, TD, TG). 



(71) Applicants and 

(72) Inventors: AMAN, James, A. [US/US]; 802 Wexford 
Way, Telford, PA 18964 (US). BENNETT, Paul, Michael 
[US/US]; 31 Saratoga Lane, Harleysville, PA 19438 (US). 

(74) Agent: NIGON, Kenneth, N.; RatnerPrestia, P.O. Box 
980, \yiey Foige, PA 19482 (US). 



Published: 

— without international search report and to be republished 
upon receipt oftiiat report 

For two-letter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations " appearing at the begin- 
ning of each regular issue of the PCT Gazette. 



(54) TlUe: AUTOMATIC EVENT VIDEOING. TRACKING AND CONTENT GENERATION SYSTEM 



< 
n 

ON 
ON 

tti 

o 
n 

O 




(57) Abstract: An automatic system 100 that uses one to three grids 20cm of overhead cameras 20c to first video an event area 2. 
Overall bandwidth is greatly reduced by intelligent hubs 26 that extract foreground blocks 10m based upon initial and continuously 
updated background images 2r. The hubs also analyze current images 10c to constantly locate, classify and track in 3D the limited 
number of expected foreground objects 10. As objects 10 of interest are tracked, the system automatically directs ptz perspective 
view cameras 40c to follow the activities. These asynchronous cameras 40c limit their images to defined repeatable pt angles and 
zoom depths. Pne-captured venue backgrounds 2r at each repeatable ptz setting facilitate perspective foreground extraction. The 
moving background, such as spectators 13, is removed with various techniques including stereoscopic side cameras 40c-b and 40c-c 
flanking each perspective camera 40c. The tracking data 101 derived from the overhead view 102 establishes event pofonnance 
measurement and analysis data 701. The analysis results in statistics and descriptive performance tokens 702 translatable via speech 
sjrnthesis into audible descriptions of the event activities corresponding to overhead 102 and perspective video 202. 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



pi- 
ntle of the Invention 

Automatic Event Videoing, Tracking and Content Genea^ation System 
Technical Field 

The present invention relates to automatic systems for videoing an event, tracking its participants and 
subsequently creating multi-media content and broadcasts. 
Background Art 

The present invention is a continuation in part of U.S. patent qjplication nuinber 10/006,444, filed on 
November 20* 2001 entitled Optimizations for Live Event, Real-Time, 3-D Object Tracking that is 
pending. 

Furthermore, the present invention incorporates by reference and claims tfie benefit of priority of the U.S. 
provisional application 60/563,091, filed on April 14* 2004, entitled Automatic Sports Broadcasting 
System, with the same named inventors. 

By today's standards, a multi-media sporting event broadcast that might typically be viewed through a 
television includes at least the following infomiation: 

• video of the game, preferably spliced together fiom multiple views; 

• replay's of key events; 

• audio of die game; 

• gc^hic overlays of key statistics such as the score and other basic game metrics; 

• ongoing '^play-by-play" audio commentary; 

• graphic overlays providing game analysis and summaries, and 

• advertisements inserted as clips during game breaks or as graphic overlays during play. 
Furthermore, after or while this information is collected, generated and assembled, it must also be encoded 
for transmission to one or more remote viewing devices such as a television or conq)uter; typically in real- 
time. Once received on the remote viewing device, it must also be decoded and therefore returned to a 
stream of visual and auditory output for the viewer. 

Any manual, semi-automatic or automatic system designed to create this type of multi-media broadcast, 
must at least be able to: 

• track ofBcial game start / stop times, calls and scoring; 

• track partic^)ant and game object noovement; 

• collect game video and audio; 

• analy2:e participant and game object movement; 

• create game statistics and commentary based upon the game analysis; 

• insert advertisements as separate video / audio clips or graphic overlays; 

• encode and decode a broadcast of the streams of video, audio, and game metric infomoation; 
The present inventors are not aware of any fijUy automatic systems for creating sports broadcasts. There 
are many drawbacks to the current largely manual systems and methodologies some of which are identified 
as follows: 



wo 2005/099423 



CA 02563478 2006-10-16 



PCTAJS2005/013132 



-2- 

• the cost of creatiiig such broadcasts are significant both in terms of equipment and labor and 
therefore excludes smaller markets such as amateur and youth sports; 

• for practical reasons such as equipment and labor costs, the number of filming cameras is limited, 

• the typical broadcaster relies upon manually operated filmmg cameras to anticq)ate and follow the 
game action, but in practice it is difScult to consistently c^ture the more important and 
interesting events &om the most desirable angles; 

• there is currently no practical means of creating a conq>lete overhead view of the ongoing game 
that can be best used for game analysis and explanation; 

• current videoing technology is synchronized to the broadcast standards, such as NTSC, which 
regulate the fiequency of hnage capture to be 29.97 firames per second v^^ch is consequently out- 
of-sync with typical indoor high-wattage lifting systems that fluctuate at intervals of 120 times 
per second, thus causing inconsistent lighting conditions per individual image fiame; 

• current filmin g technology is all based in visible ligjit and does not take advantage of potential 
information collection that is possible in tiie non-visible spectnims; 

• Tdiile some current systems can follow the game object, such as a puck, they cannot also 
automatically identify and track all participants, determining their locations and orientation 
throughout fhe entire contest, 

o v/bile some systems can automatically film the game centered around the detected 
location of the game object, they cannot additionally anticipate action based upon the 
knowledge of tracked particq)ants or direct other cameras to follow these tracked 
participants; 

• current systems cannot automaticaUy track key spectators such as coaches, family members and 
other VIP so as to automatically film them during or after key game action; 

• game analysis, especially for more dynamic and fest moving sports such as ice hockey, can 
require hundreds to thousands of ongoing observations which are extremely difficult for manual 
systems to accurately record, let alone interpret in real-time; 

• there are cmrently no systems capable of creating a flow of tokens to describe game action that 
can be used to automatically direct synthesized and pre-recorded speech adding conunentary to 
the ongoing game; 

• while inserting advertisements as clq)s into the ongoing game feed is relatively straightforward, 
adding overlaid graphics to the game action video is more problematic and requires greater forms 
of automation; 

• current practice typicaUy does not automate the interfece between flie ofBcial game start and stop 
times in order to help automatically regulate the broadcast stream of live action, replays, 
conmientary video and advertisements; 

• current practice typically does not automate the inter&ce between the ofGcial scorekeeper in order 
to help automatically determine official game scoring, penalties and other rulings; 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-3- 

• ciuient systems have no way of delineating game events based upon tracked participants and 
information collected from an inter&ce with the ofBcial sooting and ruling systen^ 

• current broadcasts are primarily designed to be output through a television and are therefore 
limited especially to the tv's display and conq)utatioiial shortcomings as well as its smaller 
broadcast bandwidths that constrain the total amount of presentable information; 

• while targeted for television output, broadcasts are not designed to take advantage of current 
conqjuter technology that is now able to generate realistic gr^hic renderings of both the human 
form and surrounding environments in real-time; 

• current broadcasts are not interactive thereby allowing the viewer to dynamically select between 
multiple video feeds to be viewed either singularly or in cornbination; 

• current encoding techniques do not take advantage of newer video and audio coiiq>ression 
technologies or possibilities therefore wastir^ bandwidth that could be used to either provide 
additional information or to conserve broadcaster capacity; 

Traditionally, professional broadcasts have relied upon a team of individuals woiking on various aspects 
of this list of tasks. For instance, a crew of cameramen would be responsible for fUming a game from 
various angles using fixed and / or roving cameras. These cameras may also collect audio from the playing 
area and / or crew members would use fixed and / or roving microphones. Broadcasters would typically 
employ professional commentators to watch the game and provide both play-by-play descriptions and 
ongoing opinions and analysis. These commentators have access to the game scoreboard and can both see 
and hear &e of&cials and re&rees as they oversee the game. They are also typically siq)ported by 
statisticians ^o create meaningfiil game analysis summaries and are therefore able to provide both o£&cial 
and unofficial game statistics as audio commentary. Alternatively, this same information may be presented 
as graphic overlays onto the video stream with or without audible comment All of this collected and 
generated information is dien presented simultaneously to a produc tion crew that selects the specific 
camera views and auditory streams to meld into a single presentation. This production crew has access to 
ojfficial game start and stop times and uses this information to control the flow of inserted advertisements 
and game action replays. The equipment used by the production team automatically encodes the broadcast 
into a universally accepted form which is tiian transmitted, or broadcast, to any and all potential viewing 
devices. The typical device is already built to accept the broadcaster's encoded stream and to decode tiiis 
into a set of video and audio signals that can be presented to die viewers through appropriate devices such 
as a television and / or multi-media equipment 

Currenfly, there are no fully, or even semi-automatic systems for creating a video and / or audio broadcast 
of a sporting event The first major problem that must be solved in order to create such a system is: 
How does an automated system become "aware'* of the game octhMes? 

Any fiiUy automated broadcast system would have to be predicated on the abiUty of a tracking system to 
continuously follow and record the location and orientation of all participants, such as players and game 
ofticials, as weQ as the game object, such as a puck, baskeft)all or football. The present inventors taught a 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-4- 

solution for this requirement in their first plication entitled "Multq>le Object Tracking System." 
Additional novel teachings were disclosed in flieir continuing ^plication entitled "Optimizations for Live 
Event, Real-Time, 3-D Object Tracking." Both of these plications specified the use of cameras to collect 
video images of game activities followed by image analysis directed towards efficiently detennining tiie 
location and orientation of participants and game objects. JmpoTtsaA techniques were taught including the 
idea of gathering overall object movement 6om a grid of fixed overhead cameras that would then 
automatically direct any niunber of calibrated perspective tracking and filming cameras. 
Other tracking systems exist in the market such as tibose provided by Motion Analysis CorporatiorL Their 
system, however, is based on fixed cameras placed at perspective filming angles thereby creating a filled 
voliune of space in which the movements of participants could be adequately detected &om two or more 
angles at all times. This approach has several drawbacks including the difficult nature of unifonnly scaling 
the system in order to encompass the different sizes and shapes of playmg areas. Furthermore, the fixed 
view of the perspective cameras is overly susceptible to occlusions as two or more participants fill tiie 
same viewing space. The present inventors prefer first determining location and orientation based upon the 
overhead view which is almost always un-blocked regardless of tiie number of participants. While the 
overhead cameras cannot sufficiently view the entire body, tiie location and orientation information 
derived fiom their images is ideal for automatically directing a multiplicity of calibrated perspective 
cameras to minimize player occlusions and maximize body views. The Motion Analysis system also relied 
upon visible, physically intmsive markings including tiie placement of forty or more r^tror^flective spheres 
attached to key body joints and locations. It was neither designed nor intended to be used in a hve sporting 
enviromnent A fiirther dmwback to using this system for automatic sports broadcasting is its filtering of 
captured images for the purposes of optimizing tracking marker recognitioxL Hence, the resulting image is 
insufficient for broadcasting and therefore a coruplete second set of cameras would be required to collect 
the game film. 

Similarly, conq)anies such as Trakus, Inc. proposed solutions for tracking key bo<fy points, (in the case of 
ice hockey a player's helmet,) and did not simultaneously collect meaningfiil game film The Trakus 
system is based upon the use of electronic beacoiss that emit pulsed signals that are tiien collected by 
various receivers placed around the tracking area. Unlike the Motion Analysis solution, the Trakus system 
could be employed in live events but only determines participant location and not orientatiorL Furthermore, 
their system does not collect game film, either &om the overhead or perspective views. 
Another beacon approach was also employed in Honey et al.'s U.S. Patent No. 5/912,700 assigned to Fox 
Sports Productions, Inc. Honey teaches the inclusion of infrared emitters in the game object to be tracked, 
in tiieir example a hockey puck. A series of two or more infiared receives detects the emissions ftom the 
puck and passes the signals to a tracking system that first triangulates the puck's location and second 
automatically directs a filming camera to follow the puck's movement 

It is conceivable that both the Trakus and Fox Sports systems could be combined formmg a single system 
that could continuously determine ttie location of all participants and tihie game object Furthermore, 
building upon techniques taught in the Honey patent, the combined system could be made to automatically 



wo 2005/0S^9423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-5- 

film the game &bm one or more perspective views. However, diis conibmed system would have several 
drawbacks. First, this system can only determine ibc location of each participant and not their orientation 
that is critical for game analysis and automated commentary. Second, the beacon based system is 
e;q)ensive to inclement in that it requires both specially constructed (and therefore expensive) pucks and 
to have transmitters inserted into player's helmets. Both of diese criteria are impractical at least at the 
youth sports levels. Third, the tracking system does not additionally collect overhead game film that can be 
conibined to form a single continuous view. Additionally, because these solutions are not predicated on 
video collection and analysis, they do not address tiie problems attendant to a multi-camera, parallel 
processing image analysis systecQ. 

Orad Hi-Tech Systems, is assigned U.S. Patent Number 5/923,365 for a Sports Event Video manipulating 
system. In this patent by inventor Tamir, a video system is taught that allows an operator to select a game 
participant for tsmpoiaxy tracking using a video screen and light perL Once identified, the system uses 
traditional edge detection and other similar techniques to follow the participant firom frame-to-fiame. 
Tamir teaches tiie use of software based image analysis to track those game participants and objects that 
are viewable anywhere witiiin fbe stream of images being captured by the filmmg camera. At 1 east because 
the smgle camera caimot maintain a con^lete view of the entire playing area at all times throughout tiie 
game, tiiere are several difBculties with tins approach. Some of these problems are discussed in the 
implication including knowing \^en participants enter and exit the current view or when they are 
occliiding each other. The present inventors prefer the use of a matrix of overhead cameras to first track all 
partic^ants tiiroughout the entire playing area and with this information to then gather and segment 
perspective film - all without the user intervention required by Tamir. 

Orad Hi-Tech Systems, is also assigned U.S. Patent Number 6/380,933 B 1 for a Graphical Video System 
In the patent, inventor Sharir discloses a system for tracking the three-dimensional position of players and 
using tiiis information to drive pre-storcd graphic animations enabling remote viewers to view the event in 
three dimensions. Rather than first tracking the players &om an overhead or substantially overhead view as 
preferred by the present inventors, in one embodiment Sharir relies upon a calibrated theodolite that is 
manually controlled to always follow a given player. The theodolite has been equipped to project a reticle, 
or pattern, that the operator continues to direct at (he moving player. As the player moves, the operator 
adjusts the angles of the theodolite that are continuously and automatically detected. These detected angles 
provide measurements that can locate the player in at least the two dimensions of the plane orthogonal to 
the axis of the theodolite. Essentially, this information will provide information about the player^s relative 
side-to-side location but will not alone indicate how £ar they are away from tibe theodolite. Sharir 
anticipated having one operator / theodohte in operation per player and is tiierefore relying upon this one- 
to-one relationship to indicate player identity. This particular embodiment has several drawbacks including 
inq)recise three-dimensional location tracking due to the single line-of-sig^t, no provision for player 
orientation tracking as well the requirement for significant operator interactioiL 
Li a different embodiment in the same plication, Sharir describes vAiat he calls a real-time automatic 
tracking and identification system tiiat relies i^n a thermal imager boresi^ted on a stadium camera. 



CA 02563478 2006-10-16 

WO 2005/099423 PCT/US2005/013132 



-6- 

Similar to the depth-of-field problem attendant to the theodolite embodiment, Sharir is using the detected 
pitch of the single thermal imagmg camera above the playing sair&ce to help triangulate the player's 
location. While this can work as a rough ^roximation, unless there is an exact feature detected on the 
player that has been calibrated to the player's height, than the estunation of distance will vary somewhat 
based upon how fer away the player truly is and what part of the player is assumed to be imaged 
Furthermore, tMs embodiment also requires potentially one manually operated camera per player to 
continuously track the location of every player at all times throughout the game. Again, the present 
invention is "fiilly*' automatic especially with respect to participant tracking. In his thermal miagmg 
embodiment, Sharir teaches the use of a laser scanner that 'Sdsits** each of the blobs detected by the 
tiiermal imager. This requires each participant to wear a device consisting of an "electro-optical receiver 
and an RF transmitter that transmits the identity of flie players to an RF receiver." There are many 
drawbacks to the identification via transmitter approach as previously discussed in relation to tiie Trakus 
beacon system. Ihe present inventors prefer a totally passive imaging system as taught in prior co-pending 
and issued applications and further discussed hereirL 

And finally, in U.S. Patents 5/189,630 and 5/526,479 Barstow et aL discloses a system for broadcasting a 
stream of ''conq>uter coded descriptions of the (game) sub-events and events" that is transmitted to a 
remote system and used to recreate a con^uter simulation of tiie game. Barstow anticipates also providing 
traditional game video and audio essentially indexed to these "sub-events and events" allowing the viewer 
to controUably recall video and audio of individual plays. Wifli respect to the current goals of the present 
appUcation, Barstow's system has at least two major drawbacks. First, these "coded descrq)tions" are 
detected and entered into the compute database by an "observer who attends or watches the event and 
monitors each of the actions which occurs in the course of the event" The present inventors prefer and 
teach a fully automated system c£^>abie of tracking all of the game participants and objects thereby creating 
an on going log of all activities which may then be interpreted through analysis to yield distinct events and 
outcomes. The second drawback is an outgrowth of the first limitatiorL Specifically, Barstow teaches the 
pre-establishment of a "set of rules" defining all possible game "events." He defines an "event** as "a 
sequence of sub-events constituted by a discrete number of actions selected fipom a finite set of action 
types. . . Bach action is definable by its action type and fiom zero to possibly several parameters associated 
with that action type." In essence, the ttrtire set of "observations" allowable to the "observer i«4io attends 
or watches" the game must conform to this pre-established system of interpretatioa Barstow teaches that 
"the observer enters associated parameters for each action which takes place during the event" Of course, 
as previously stated, human observers are extremely limited in their ability to accurately detect and timely 
record participant location and orientation data that is of extreme inqx>rtance to the present inventor's view 
of game analysis. Barstow's computer simulation system builds into itself these very limitations. 
Ultimately, this stream of human observations tiiat has been constrained to a limited set of action types is 
used to "simulate" the game for a r^ote viewer. 

With respect to an automated system capable of being "aware" of the game activities, only the teachings of 
the present inventors address an automatic system for: 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



• collecting oveihead film that can be dually used for both tracking and videoing; 

• specifying how this mosaic of overl^ping, oveihead film can be combined into a single 
contiguous and continuous video stream; 

• analyzing the video stream to detemoune both the location and orientation of the participants and 
game objects; 

o determining three dimensional infomoation including the height of the game object off of 
the playing surfece; 

• analyzing the film to determine the idmtity of participants who are wearing unique afiSxed 
maridngs such as encoded helmet stickers; 

• directing perspective ID cameras to follow detected participants for the purposes of collecting 
isolated images of their jersey number and otfier existing identifying maiks; 

o alternatively determining participant identification by performing pattern recognition on 
these key isolated images of partic^>ant jersey numbers and other identifying marks; 

• directing perspective filming cameras to collect additional video and locate additional body points; 

• additionally collecting overiiead and perspective video &om the non-visible spectrum including 
ultraviolet and infi:ared fiequendes that can be used to locate specially placed non-visible 
markings placed on a given participants key body locations; 

o dynamically creating a three-dimensional kinetic body model of participants using the 
tracked locations of tiie non-visible maridngs; 

• creating separate film and tracking databases &om these continuous streams of overhead and 
perspective images; 

• analyziog tiie tracking database in real-time to detect and classify individual game events; 

• directing perspective videoing cameras to follow detected unfolding events of current or potential 
significance from camera angles antic^ated to best reveal the game action, and 

o directing tiiese same perspective cameras that might normally c^ture images at roughly 
30 firames per second to occasionally capture higher 60, 90, 120 or more fones \^en 
selected key events are unfolding tiiereby siq)porting slow and siq>per-sIow motion 
rqplays. 

In order to create a complete automatic broadcasting system, additional problems needed to be resolved 
such as: 

HoiP can a ^stem filming high speed modon tfiai requires fast shutter speeds synchronize itself to the 
lighting system? 

The typical video camera captures images at the MTSC Broadcast standard of 29.97 frames per second 
Furthermore, most often they use what is referred to as full integration which Tnpang that each fi^ame is 
basicaUy "exposed" for die maximum time between fiames. In the case of 29.97 frames per second, the 
shutter speed would be roughly 1/30* of a second This approach is acceptable for normal continuous 
viewing but leads to bluned images ^en a single fiame is fix>zen for "stop action" or "freeze fiame" 



CA 02563478 2006-10-16 

WO 2005/099423 PCT/US2005/013132 



-8- 

viewing. In order to do accurate image analysis on high-speed action, it is both in^)ortant to c^ture at least 
30 if not 60 fiames per second and that each fiame be captured with a shutter speed of 1/500* to 1/1000* 
of a second. Typically, image analysis is more reliable if there is less image blurring. 
Coincident with this requirement for &ster shutter speeds to support accurate image analysis, is the issue of 
indoor lighting at a sport facility such as an ice hockey rink. A typical rink is illuminated using two 
separate banks of twenty to thirty metal halide hanps with magnetic ballasts. Both banks, and therefore all 
haaaps, are powered by the same alternating current that typically runs at 60 HZ, causing 120 "on-off" 
cycles per second. If the image analysis cameras use a shutter speed of 1/120* or greater, for instance 
1/500* or 1/1000* of a second, then it is possible that the laxnp will essentially be "off" or discharged 
when the cameras sensor is being e?q>osed. Hence, what is needed is a way to synchronize the camera's 
shutter with the lighting to be certam that it only captures images when the lamps are discharging. The 
present application teaches the synchronization of the high-shutter-speed tracking and filming cameras 
with the sports venue lighting to ensure mayiTnum, consistent image hg^ting. 
How can a practical, low-cost system be buUt to process tfte simultaneous image flow from 
approydmatdy two hundred cameras capturing thirty to one hundred and twenty images per second? . 
Current technology such as tiiat provided by Motion Analysis Corporation, typically supports up to a 
practical maximum of thirty-two cameras. For an indoor sport such as youth ice hockey, where the ceiling 
is only twenty-five to thirty-feet off the ice sur&ce, the present inventors prefer a system of eigjity or more 
cameras to cover the entire tracking area. Furdiermore, as will be taugjht in the present specification, it is 
beneficial to create two to three separate and conQ>lete overlapping views of the tracking sur&ce so tiiat 
each object to be located appears in at least two views at all times. The resulting oveihead tracking system 
preferably consists of 175 or more cameras. At 630 x 630 pixels per image and three bytes per pixel for 
encoded color information amoimting to 1MB per fiiame, the resulting data stream from a single camera is 
in the range of 3 0MB to 60MB per second. For 1 75 cameras this stream quickly grows to approximately 
125GB per second for a 60 fi:ames per second system. Current PC's can accept around 1GB per second of 
data that they may or may not be able to process in real-time. 

In any particular sporting event, and especially in ice hockey, the majority of the playing sur&ce wiU be 
empty of partic^ants and game objects at any given time, especially vih&x viewed fix)m overhead For ice 
hockey, any single player is estimated to take ^jproximately five square feet of viewing space. If there 
are on average twenty players per team and three game officials, then the entire team could fit into 5 sq. ft 
X 23 players =115 sq. ft/all players. A single camera in the present specification is expected to cover 18 
ft by 18 ft for a total of 324 sq. ft Hence, all of the players on both teams as well as the game officials 
could fit into the equivalent of a single camera view, and therefore generate only 30 MB to 60 MB per 
second of bandwidth. This is a reduction of over 200 times fixHn the maYimiim data stream and would 
enable a conventional PC to process the oncoming stream. 

What is needed is a system capable of extracting the moving foreground objects, such as particq)ants and 
game objects, in real-time creating a minimized video image dataset This minimized dataset is then more 
easily analyzed in real-time allowing the creation of digital metrics that symbolically encode participant 



CA 02563478 2006-10-16 

WO 2005/099423 PCT/US2005/013132 



-9- 

locations, orientations, shapes and identities. Furlhennore, this same mininuzed dataset of extracted 
foreground objects may also be leassembled into a coniplete view of the entire surfece as if taken by a 
single camera. The present invention teaches a processing hierarchy including a first bank of overhead 
camera assemblies feeding full ftame data into a second level of intelligent hubs that extract foreground 
objects and creating corresponding symbolic representations. This second level of hubs then passes the 
extracted foreground video and symbolic streams into a fliird level of multiplexing hubs that joins the 
incoming data mto two separate streams to be passed off to botii a video conq)ressiQn and a tracking 
analysis system, respectively. 

What is the correct configuration of overhead filming cameras necessary to accuratefy hcate 
participants and game objects in three dimensions wUhout significant image distortion? 
The appro ach of film i n g a sporting event from a fixed overhead view has been the starting point for other 
companies, researcher's and patent plications. One such research team is tiie Machine Vision Group 
(NffVG) based out of tiie Electrical Engineering Department of the University of Ijubljana, of Slovenia. 
Their proach intqjlemented on a handball court, uses two overhead cameras with wide angle lenses to 
c^ture a rougjily one hour match at 25 fiames per second. Th& processing and resulting analysis is done 
post-event with the help of an operator, "who supervises tfie tracking process." By using only two cameras, 
both the final processing time and the operator assistance are minimized. However, this savings on total 
acquired image data necessitated the use of the wide angle lens to cover the larger area of a half court for 
each single camera. Furthermore, significant conq>uter processing time is e;q)ended to correct for tiie 
known distortion created by the use of wide angle lenses. This eventuality hinders the possibility for real- 
time anal>^. Without real-time analysis, the overhead tracking system cannot drive one or more 
perspective filming cameras in order to follow the game action. What is needed is a layout of cameras that 
avoids any lens distortion tiiat would require image analysis to correct The present invention teaches the 
uses of a grid of cameras, each with smaller fields-of-view and tiierefore no required wide-angle lenses. 
However, as previously mentioned the significantiy larger number of simultaneous video streams quickly 
exceeds existing con^>uter processuig limits and therefore requires novel solutions as herein disclosed. 
The system proposed by tfie MVG also z^jpears to be mainly focused on tracking the movements of all the 
participants. It does not have the additional goal of creating a viable overhead-view video of the contest 
tiiat can be watched similar to any traditional perspective-view game video. Hence, while conq)uter 
processing can correct for the severe distortion caused by the camera arrangement choices, the resulting 
video images are not equivaleiit to those &miliar to tiie average sports biY>adcast viewer. What is needed is 
an arrangement of cameras tiiat can provide minimally distorted images that can be combined to create an 
acceptable overhead video. The present invention teaches an overlapping arrangement of two to three grids 
of cameras where each grid forms a single complete view of tiie tracking sur&ce. Also taugjit is the ideal 
proximity of adjacent cameras in a single grid, based upon fectors such as tiie maximum player's height 
and tiie expected viewing area conqirised by a realistic contiguous groining of players. Hie present 
specification teaches tiie need to have significant overlap in adjacent camera views as opposed to no 
appreciable overly such as with tiie MVG system. 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-10- 

Furthennore, because of the limited resolution of each single camera in the MVG system, the resulting 
pixels per inch of tracking area is insufGcient to adequately detect foreground objects the size of a handball 
or identification markings afGxed to Uxo player such as a helmet sticker. What is needed is a layout of 
cameras that can form a conq)lete view of the entire tracking surface with enough resolution to sufBdently 
detect the smallest anticipated foreground object, such as the handball or a puck in ice hockey. The present 
invention teaches just such an arrangement that in combination with the smaller fields of view per 
individual camera and the overl^ping of adjacent fields-of-view, in total provides an overall resolution 
sufiScient for the detection of all expected foreground objects. 

Similar to the system proposed by MVG, Larson et al. taught a camera based tracking system in U.S. 
Patent Number 5/36337 entitied "Automated Camera-Based Tracking System for Sports Contests." 
Larson also proposed a two camera system but in his case one camera was situated directly above the 
playing sur&ce while tiie other was on a perspective view. It was also anticipated that an operator would 
be necessary to assist the image analysis processor, as with die MVG solution. Larson further anticipated 
using beacons to help track and identify partic^ants so as to minimize the need for the separate operator. 
How can perspective filming cameras be controlled so that as they pan, tilt and zoom their collected 
video can be efficientfy processed to extract the moving foreground from the fixed and moving 
background and to support the insertion ofgr<^hic overlap? 

As with the overhead cameras, the extraction of moving foreground objects is of significant benefit to 
image conpression of tiie perspective film. For instance, a single perspective filming camera in color at 
VGA resolutions would fill up ^proximately 90% of a single side of a typical DVD. Furthermore, this 
same data stream would take up to .7 MB per second to transmit over the hitemet, fdoc exceeding current 
cable modem capacities. Therefore, the ability to separate the participants moving about in the foreground 
from the playing venue forming the background is of critical issue for any broadcast intended eq)ecially to 
be presented over the hitemet and / or to mclude multiple simultaneous viewing angles. However, this is a 
non-trivial problem when considering that the perspective cameras are themselves moving thus creating 
the effect even the fixed aspects of die badcground are moving in addition to die moving background and 
foregrpimd. 

As previously mentioned, the present inventors prefer the use of automated perspective filming cameras 
^x4iose pan and tilt angles as well as zoom depths are automatically controlled based itpon information 
. derived in real-time fiom the overhead tracking systenL Tliere are other systems, such as that specified in 
the Honey patent, that ertq>loy controlled pan / tilt and zoom filming cameras to automatically follow the 
game action. However, die present inventors teach the additional step of limiting individual fi:ame captures 
to only occur at a restricted set of aUow camera angles and zoom depths. For each of tiiese allowed angles 
and depths, a badcground image will be pre-cq>tured while no foregroimd objects are present; for example 
at some time when the &citity is essentially enq}ty. These pre-captured background images are then stored 
for later recall and comparison during the actual game filming. As the game is being fihned by each 
perspective camera, the overhead system will continue to restrict images to the allowed, pre-determined 
angles and depths. For each current image cq)tured, the system will look up the appropriate stored 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-11- 

background image matching the cuxrent pan / tilt and zoom settings . This pre-stored, matched background 
is then subtracted &om the current image hereby efBciently revealing any foreground objects, regardless 
of whether or not they are moving. In effect, it is as if the perspective cameras were stationary similar to 
the overhead cameras. 

While typical videoing cameras maintain dieir constant NTSC broadcast rate of 29.97 fiames per second, 
or some multiple thereof, the perspective cameras in &e present invention will not follow this standardized 
rate. In &ct, under certain circumstances diey will not have consistent, fixed intervals between images such 
as 1/30^ of a second The actual capture rate is a dependent upon the speed of pan, tih and zoom motions 
in conjunction with the allowed imaging angles and depths. Hence, the present inventors teach the use of 
an automatically controlled videoing canoera that c^tures images at an asynchronous rate. la practice, 
these cameras are designed to maintain an average number of images in the equivalent range such as 30, 60 
or 90 frames per second After capturing at an asynchronous rate, these same images are then synchronized 
to the desired output standard, such as NTSC. The resulting minimal time variations between frames are 
anticq>ated to be unintelligible to the viewer. The present inventors also prefer synchronizing these same 
cameras to the power lines driving the venue lifting thereby supporting higher speed image c^tures . 
These higher speed captures will result in crisper images, especially during slow or freeze action and will 
also support better image analysis. 

The present inventors also teach a method for storing the pre-c^tured backgrounds from the restricted 
camera angles and zoom depths as a single panoramic . At any given moment, the current camera pan and 
tilt angles as well as zoom depth can be used to index into the panoramic dataset in order to create a single- 
frame background image equivalent to the cunent view. While the panoramic q)proach is e>qpected to 
introduce some distortion issues it has tiie benefit of greatly reducing &e required data storage for the pre- 
captured backgrounds. 

hi addition to removing the fixed background from every current image of a perspective camera, there will 
be times when the current view includes a moving background such as spectators in the surrounding 
stands. Traditional methods for renK)ving this type of background infonnation include processing and time 
extensive intra and inter-frame image analysis. The present inventors prefer segmenting each captured 
image &om a perspective camera into one to two types of background regions based upon a pre-measured 
three-dimensional model of die playing venue and the controlled angles and depth of the current im^. 
EssentiaDy, by knowing where each camera is pointed with respect to the three-dimensional model at any 
given moment, the system can always determine which particular portion of the playing venue is in view. 
In some cases, this current view will be pointed wholly onto the playing area of the &ciHty as opposed to 
some portion of the playing area and surroimding stands. In diis case, the background is of die fixed type 
only and sin:q>Ie subtraction between the pre-stored bacl^round and the current image will yield the 
foreground objects. In the alternate case, were at least some portion of the current view includes a region 
outside of the playing area, than the contiguous pixels of the current image corresponding to this second 
type of region can be effectively determined in die current image via die three-dimensional modeL Hence, 
die system will know which portion of each image taken by a perspective filming camera covers a portion 



CA 02563478 2006-10-16 
WO 2005/099423 PCTAJS2005/013132 



-12- 

of the venue surrounding the playing area. It is in the surrounding areas that moving background objects, 
such as spectators may be found 

The present inventors further teach a method for employing the information collected by the overhead 
cameras to create a topological three-dimensional profile of any and all participants who may lumpen to be 
in the same field-of-view of the current image. This profile will serve to essentially cut out the participants 
profile as it overlays the surrounding area that may happen to be in view behind them. Once this 
topological profile is determined, all pixels residing in the surrounding areas that are detemcuned to not 
oveiis^ a participant (i.e. they are not directly behind the play^,) are automatically dropped. This 
liardware" assisted method of rejecting pixels that are not either a part of the fixed background or a 
tracked partic^;>ant, offers considerable efficiency over traditional sofiw^ 

After successfully removing, or segmenting, the image foreground fipom its fixed and moving backgrouiuls, 
the pres^ inventors teach die limited encoding and transmission of just flie foregroimd objects. Hiis 
reduction in overall information to be transmitted and / or stored yields expected Internet transfer rates of 
less tiian 50 KB and full film storage of .2 GB, or only 5% of today^s DVD c^;>acity. Upon decoding, 
several options are possible including the reinstatement of the fixed bacl^iround &om a panoramic 
reconstruction pre-stored on the remote viewing system It is anticipated that the look of this recombined 
image will be essentially indistinguishable &om the original image. All that will be missing is minor 
background sur&ce variations that are essentially insignificant and images of the moving background such 
as the spectators. The present inventors prefer the use of state of the art animation techniques to add a 
simulated crowd to each individual decoded firame. It is furth^ anticipated that these same animation 
techniques could be both acceptable and preferable for recreating the fixed background as opposed to using 
the pre-transmitted panoraipic. 

Witii respect to the audio coinciding to tiie game film, the present inventors anticipate either transmitting 
an authentic capture or alternatively sending a synthetic translation of the at least the volume and tonal 
aspects of the ambient crowd noise. This synthetic translation is expected to be of particular value for 
broadcasts of youth games where tiiere t^ids to be smaller crowds on hand. Hence, as the game transpires, 
the participants are extracted &om &e playing venue and transmitted along witii an audio m^ing of tiie 
spectator responses . On tiie remote viewing system, the game may Hiea be reconstructed witii the original 
view of iSbe particq)ants overlaid onto a professional arena, filled with spectators whose synthesized 
cheering is driven by the original spectators. 

With respect to the recreation of the playing venue backgroimd on the remote viewing system, both the 
"real-image" and '^grsqphically-rendered" approaches have the additional advantage of being able to easily 
overlay advertisements. Essentially, after recreating &e background using either actual pre^red images 
of the venue or gr^hic animations, advertisements can be placed in accordance witih ttie pre^own diree- 
dimensiorial iiiq} and transmitted cunent caniera ang^e being displayed After th^ 
foreground objects are overlaid forming a conqplete reconstmction. There are several other inventors who 
have addressed the need for overlaying advertisements onto sports broadcasts. For instance, there are 
several patents assigned to Orad Hi-Tech Systems, LTD including U.S. Patents 5/903317, 6/191,825 Bl, 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-13- 

6/208^86 Bl, 6n92a21 Bl, 6/297,853 Bl and 6/384,871 Bl. Hiey are directed towards "apparatus for 
automatic electronic replacement of a billboard in a video image.** The general ^roach taught in these 
patents limits the inserted advertisements to those areas of the image detennined to already contain 
existing advertising. Furthennore, these systems are designed to embed these replacement advertisements 
in the locally encoded broadcast that is then transmitted to the remote viewer. This method naturally 
requires transmission bandwidth for the additional advertisements now forming a portion of the 
background (which the present inventors do not transmit) 

The present inventors prefo to insert these advertisements post transmission on the remote viewing device 
as a part of the decoding process. Advertisements can be placed anywhere dflier in the restored life-like or 
graphically animated background If it is necessary to place a specific ad directly on top of an existing ad 
in the restored life-like image, the present inventors prefer a calibrated three-dimensional venue model that 
describes the player area and all important objects, hence the location and dimensions of billboards. This 
calibrated three-dimensional model is synchronized to the same local coordinate s^em used for die 
overhead and perspective filming cameras . As such, the camera angle and zoom depth transmitted with 
each sub-fi:am6 of foreground information not only indicates i^ch portion of the background must be 
reconstructed according to the three-dimensional map, but also indicates whether or not a particular 
billboard is in view and should be overlaid with a different ad 

Odier teachings exist for inserting static or dynamic images into a live video broadcast whidi covers a 
portion of the purposes of the present Automated Sports Broadcasting System. For instance, in U.S. Patent 
No. 6/100,925 assigned to Princeton Video Image, hic, Rosser et al. discloses a method that relies upon a 
plurality of pre-known landmarks within a given venue that have been calibrated to a local coordinate 
system in ^^ch the current view of a filming cam^ can be sensed' and calculated Hence, as the 
broadcast caniera fieely pans, tilts and zooms to film a game, its current orientation and zoom depth is 
measured and translated via tiie local coordinate system into an estimate of its field-of-view. By referring 
to the database of pre-known landmarks, the system is able to predict ^en and \rfiere any given landmark 
should 2spg^ in any ^ven field-of-view. Next, the system employs pattern matching between the pixels ia 
the current image anticipated to represent a landmark and the pre-known sh^, color and texture of the 
landmaric Once the matching of one or more landmarks is confirmed, the system is then able to insert the 
desired static or dynamic images. In an alternative embodiment, Rosser suggest using transmitters 
erhbedded in the game object in order to triangulate position in essence creating a moving laiidinark. This 
transmitter approach for tracking the game object is substantially similar to at least that of Trakus and 
Honey. 

Like the Qrad patents for inserting advertisements, the teachings of Rosser differ fcom the present 
invention since the inserted images are added to the encoded broadcast prior to transmission, therefore 
taking \xp needed bandwidth. Furtiiermore, like die Trakus and Honey solutions for beacon based object 
tracking, Rosser's teachings are not sufBdent for tracking the location and orientation of multiple 
participants. At least these, as well as other drawbacks, prohibit the Rosser patent fi:om use as an automatic 
broadcasting system as defined by the present inventors. 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-14- 

With the siinilar purpose of inserting a graphic into live video, in U.S. Patent No. 6/597,406 B2 assigned to 
Sportvision, Inc., inventor Gloudeman teaches a system for combining a three-dimensional model of the 
venue with the detected camera angle and zoom depth. An operator could tiien interact with the three- 
dimensional model to select a given location for the graphic to be inserted. Using the sensed camera pan 
and tilt angles as well as zoom depth, the system would then transform the selected three-dimensional 
location into a two-dimensional position in each current video fiame fix)m the camera. Using this two- 
dimensional position, the desired graphic is then overlaid onto the stream of video images. As with other 
teachings, Gloudeman's solution inserts the graphic onto the video frame prior to encoding; again taking 
up transmission bandwidth. The present inventors teach a method for sending feis insertion location 
information along with ttie extracted foreground and current camera angles and depths associated with 
each frame or sub-frame. The remote viewing system then decodes tiiese various conq>onents with pre- 
knowledge of botii the diree-dimensional model as well as the background image of the venue. During this 
decode step, tiie background is first reconstructed from a saved background image database or panorama, 
after which advertisements and / or graphics are either placed onto pre-determined locations or inserted 
based upon some operator input And finally, the foreground is overlaid creating a conq>leted image for 
viewing. Note that the present inventors anticipate that the information derived from participant and game 
object tracking will be sufEcient to indicate where graphics should be inserted thereby ehminating the need 
for operator input as specified by Gloudeman. 

How can a system track and identify fitters wiOtout using any special markings? 

The governing bodies of many sports throughout the worid, especially at the amateur levels, do not allow 

any foreign objects, such as electronic beacons, to be placed upon the participants. What is needed is a 

system that is capable of identifying participants without the use of spedaUy a£& 

beacons. The present inventors are not aware of any systems that are currently able to identify participants 

using the same visual markings ftat are available to human ^[>ectators, such as a j ers^ team logo, player 

number and name. He present ^hcation builds upon the prior plications included by reference to 

show how the location and orientation information determined by the overhead cameras can be used to 

automatically control perspective view cameras so as to capture images of the visual markings. Once 

captured, these ma r k ing s are then compared to a pre^own database thereby allowing for identification 

via pattern matching. This method will allow for the use of the present invention m sports where 

participants do not wear full equipment with headgear such as basketball and soccer. 

How can a single camera be constructed to create simultaneous inures in die visible and non-visible 

spectrums to facilitate the extraction of Hie foreground objects followed by the efficient locating ofar^ 

non-visible markings? 

As was first taught in prior jqppUcations of the present inventors, it is possible to place marks in flie form of 
coatings onto sur&ces such as a player's uniform or game equipment These coatings can be specially 
f orraul ated to substantially transmit electromagnetic energy in the visible spectrum from 380nmto770nm 
while simultaneously reflecting or absorbing energies outside of this range. By transmitting the visible 
spectrum, these coatings are in effect "not visually ^parent" to the human eye. However, by either 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-15- 

absorbing or reflecting the non-visible spectrum, such as ultraviolet or infiured, these coatings can become 
detectable to a machine vision system that operates outside of the visible spectrum. Among other 
possibilities, the present inventors have anticipated placing these '^non-^jparenf maddngs on key spots of 
a player's uniform such as their shoulders, elbows, waist, knees, ankles, etc. Currently, machine vision 
systems do exist to detect the continuous movement of bo^ joint markers at least in the infrared spectrum- 
Two such manu&cturers known to the present inventors are Motion Analysis Corporation and VicoiL 
However, in both conq)any 's systems, the detecting cameras have been filtered to only pass the infi:aied 
signal Hence, the reflected energy fipom the visible spectrum is considered noise and eliminated before it 
can reach the camera sensor. 

The presmt inventors prefer a different approach that places what is known as a 'Tiot mirror" in front of the 
camera lens that acts to reflect the infriared frequencies above 700 nm off at a 45° angle. The reflected 
infrared energy is then picked up by a second imaging sensor responsive to the near-infiiared frequencies. 
Tlie remaining frequencies below 700 imi pass directly through the "hot mirror^ to the first imaging sensor. 
Such an ^>paratus would allow the visible images to be cs^tured as game video while simultaneously 
creating an exactly overh^ing stream of infi:ared images. This non-visible spectrum information can then 
be separately processed to pirq>oint the location of mariced body joints in the overkqpped visible image. 
Ultimately, this method is an important tool for creating a thre&-dimensional kinetic model of each 
participant The present inventors anticipate optionally incliiding these motion models in frie autonoated 
broadcast This kinetic model dataset will require significantly less bandwidth than the video streams and 
can be used on the remote system to drive an interactive, three-dimensional graphic aiumation of the real- 
life action. 

How can spectators be tracked md filmed, and the playing venue be audio recorded in a ws^ ^at allows 
this ad^tional non-participant video and audio to be meaningfully blended into the game broadcast? 
For many sports, especially at the youth levels vAisssQ the spectators are mostly parents and friends^ the 
story of a sporting event can be enhanced by recording what is happening around andin si^port of the 
game . As mentioned previously, creating a game broadcast is an expensive endeavor and that is typically 
reserved for professional and elite level conq>etitioiL However, tiie present inventors antic^te that a 
relatively low cost automated broadcast system that delivered its content over the Internet could open up 
the youth sports madcet Given the &ct tiiat mo st youth sports are attended by the parents and guardians of 
the participants, the spectator base for a youth contest represents a potential source of interesting video and 
audio content Currently, no system exists that can automatically associate the parent with the participant 
and subsequently track the parents location throughout &e contest This tracking information can then be 
used to optionally video any given parent(s) as the game tracking system becomes aware that their child / 
participant is currently involved in a significant event 

Several companies have either developed or are working on radio frequency (RF) and ultra-wide band 
(UWB) wearable tag tracking systems. These RF and UWB tags are self-powered and uniquely encoded 
and can, for instance, be worn around an individual spectator's neck. As the &n moves about in the stands 
or area surrounding the game sur&ce, a separate tracking system will direct one or more automatic pan/ 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-16^ 

tilt / zoom filming cameras towards anyone, at any time. Hie present inventors envision a system y/betG 
each parent receives a uniquely encoded tag to be worn during the game allowing images of them to be 
captured during plays their child is determined to be involved with. This approach could also be used to 
track coaches or VIP and is subject to many of the same novel apparatus and methods taught herein for 
f j lmtng the participants. 

Haw can tite official indkadons of game clock start and stop times be detected to aSowfor the 
automatic control of the scoreboard and for time stamping of Hie filming and tracking databases? 
The present invention for automatic sports broadcasting is discussed primarily in relation to the sport of ice 
hockey. In this sport as in many, the time clock is essentially controlled by the referees. When the puck is 
dropped on a fiace-of^ the of&cial game clock is started and whenever a whistle is blown or a period ends, 
the clock is stopped. Traditionally, especially at the youth level, a scorekeeper is present monitoring the 
game to watch for puck drops and listen for whistles, hi most of the youth rinks this scorekeeper is 
working a console that controls the ofEcial scoreboard and clock. The present inventors antidpate 
inter&cing this game clock to the tracking system such that at a minimum, as the operator starts and stops 
the time, the tracking system receives £q>propriate signals. This interface also allows the tracking system to 
confirm of&cial scoring such as shots, goals and penalties. It is further anticipated that this inter&ce will 
also accept player numbers indicating ofGoial scoring on each goal and penalty. 
The present inventors are aware at least one patent proposing an automatic inter&ce between a referee's 
whisde and the game scoreboard. In U.S. Patent No. 5/293,354, Costabile teaches a system that is 
essentially tuned to the ftequency of the properly blown whistle. This ''remotely actuatable sports timing 
system" includes a device wom by a referee that is enable of detecting the whistle's sound waves and 
responding by sending off its own RF signal to start / stop the ofEcial clock. At least four drawbacks exist 
to Costabile's solution. First, the referee is required to wear a device whidi, iqxm falling could cause 
serious injury to the referee. Second, while this device can pick up tiie whistle sound, it is unable to 
distinguish which of up to three possible referees actually blew the whistle. Third, if the whistle if the 
airflow through the whistle is not adequate to create the target detection frequmcies, then Costs^ile's 
receiver may "miss" the clock stoppage. And finally, it does include a method for detecting when a puck is 
dropped, ^ch is how the clock is started for ice hockey. 

The present inventors prefer an alternate solution to Costabile that includes a miniturized air-flow detector 
in each referees whistle. Once air-flow is detected, for instance as it flows across an iotemal pinwheel, a 
unique signal is generated and automatically transmitted to the scoreboard inter&ce thereby stopping the 
clock. Hence, the stoppage is accoimted to only one whistle and therefore referee. Furthermore, the system 
is built into the ii^iiistle and carries no additional danger of harm to the referee iqx>n filing. In tandem with 
the air-flow detecting whistle, the present inventors prefer using a pressure sensitive band worn around two 
to three fingers of flie referee's hand. Once a puck is picked up by the referee and held in his palm, the 
pressure sensor detects &e presence of fliepuck and hghts up a small LED for verification. After the 
referee sees the ht LED, he titen is ready and ultimately drops the puck. The pressure on &e band is 
released and a signal is sent to the scoreboard interfeice starting die of&cial clock. 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-17- 

By automatically detecting clock st£^ and stops times as well as picking up official game scoring through 
a scoreboard inter&ce, die present invention uses this information to help index the captured game film. 
How cm tracking data determined by video image anafysis be used to create meaningful statistics and 
performance metrics that can be compared to subjective observation titereby providing for positive feed" 
back to influence the entire process? 

Especially for Ae ice hockey, many of the player movements in sports are too fest and too nimierous to 
quantify by human based observation. Li practice, game observers will look to quantify a small nuniber of 
well-defined, easily observed events sudi as "shots" or 'luts." Beyond this, many e3q>erienced observers 
will also make qualitative assessments concerning player and team positioning, game speed and intensity, 
etc. This former set of observations comes without verifiable measurement At least the Trakus and Orad 
systems have anticipated Ihe benefit of a stream of verifiable, digitally encoded measurements. This stream 
of di gital performance metrics is expected to provide the basis for summarization into a newer class of 
meaningful statistics. However, not only are there significant drawbacks to the apparatus and methods 
proposed by Tiakus and Orad for collecting these digital metrics , there is at least one key measurement that 
is missing. Specifically, the present inventors teach the collection of participant orientation in addition to 
location and identity. Furdiermote, the present inveiitors are the only system to teach a method ^licable 
to live sports for collecting continuous body joint location tracking above and beyond participant location 
tracking. 

This continuous accumulation of location and orientation data recorded by particq>ant identity thirty times 
or more per second yields a significant database for quantifying and qualifying tiie sporting event The 
present inventors anticipate submitting a continuation of the pies^t invention teaching various methods 
and steps for translating these low level measurements into ntieaningful hi^ier level game statistics and 
qualitative assessments. While tiie majority of these teachings will be not addressed in the present 
application, what is covered is the metiiod for creating a feed-back loop between a fiiUy automated 
"objective" game assessment system and a human based "subjective" system. Specifically, the present 
inventors teach a method of creating 'Tiigher level" or "judgment-based" assessments that can be common 
to bodi traditional "subjective" methods and newer "objective" based methods. Hence, after viewing a 
game, both the coaching staff and the traddng system rate several key aspects of team and individual play. 
Theoretically, both sets of assessments should be relatively similar. The present inventors prefer capturing 
the coaches "subjective" assessments and using them as feed-back to automatically adjust the weighting 
formulas used to drive the underlying "objective" assessment fbmoulas. 

Most of the above listed references are addressmg tasks or portions of tasks tl^t siq^port or help to 
automate the traditional approach to creating a sports broadcast Some of the references suggest solutions 
for gathering new types of performance measurements based i^n automatic detection of player and / or 
game object movements. What is needed is an automatic integrated system combining solutions to the 
tasks of: 

• tracking official game start / stop times, calls and scoring; 



wo 2005/0^423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-18" 

• automatically trackmg participant and game object movement using a multiplicity of substantially 
overhead viewing cameras; 

• automatically assembling a single con^osite overhead view of the game based upon the video 
images captured by the tracking system; 

• collecting video fix)m one or more perspective view cameras that are automatically directed to 
follow the game action based upon the determined participant and game object movement 

• automatically collecting game audio and creating matched volume and tonal mappings; 

• analyzing participant and game object movement to create game statistics and performance 
measurements forming a stream of game metrics; 

• automatically creating performance descr^tor tokens based upon the game metrics describing the 
in^rtant game activities; 

• dtynamically assembling combinations of tiie video, game metrics, performance tokens and audio 
information into an encoded broadcast based upon remote viewer directives; 

• transmitting the broadcast and receivhig back interactive viewer directives; 

• decoding the broadcast into a stream of video and audio signals cq)able of being presented on the 
viewing device, where 

• the backgroimd may be chosen by the viewer to match either the original or a diflferent facility, in 
either **natural" or "animated" formats; 

• the overhead game view and a multipUcity of perspective views are available under user direction 
in either video, gradient "colorized line-arf * or symbolic formats; 

• standard and custom advertisements are inserted, preferably based upon the known profile of the 
viewer, as separate video / audio clips or graphic overlays; 

• statistics, performance measurements and other game analysis are graphically overlaid onto the 
generated video; 

• audio game commentary is automatically synthesized based upon the performance tokens, and 

• crowd noise is automatically synthesized based upon the matched volume and tonal mappings as 
an alternative to the "natural" recorded game audio. 

When taken together, the individual sub-systems for performing these tasks become an Automatic Event 
Videoing, Tracking and Content Generation System- 
Given the current state of the art in CMOS image sensors. Digital Signal Processors (DSP's), Field 
Programmable Arrays (FPGA's) and other digital electronic conqH}nents as well as general conq)uting 
processors, image optics, and software algorithms for performing image segmentation and analysis it is 
possible to create a massively parallel, reasonably priced machine vision based sports tracking system. 
Also, given the additional state of the art in mechanical pan / tilt and electronic zoom devices for use with 
videoing cameras along with algorithms for encoding and decoding highly segmented and compressed 
video, it is possible to create a sophisticated automatic filming system controlled by the sports tracking 
systenL Furthermore, given state of the art low cost con^>uting systems, it is possible to breakdown and 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-19- 

analyze the cottected player and game object tracking information in teal-time forming a game metrics and 
descriptor database. When combined with advancements in text-to-speech syn&esis, it is dien possible to 
create an Automatic Event Videoing, Tracking and Content Generation System enable of recording, 
measuring, analyzing, and describing in audio &e ensuing sporting event in real-time . Using Urn 
conibination of apparatus and methods provides opportunities for video con^ression significantly 
exceeding current standards thereby providing opportunities for realistically distributing the resulting 
sports broadcast over non-traditional mediums such as the Intemet 

While the present invention will be specified in reference to one particular exanqple of sports broadcasting, 
as will be described forthwith, this specification should not be construed as a limitation on the scope of the 
invention, but rather as an exenq)lification of the preferred anbodiments thereof. The inventors envision 
many related uses of the apparatus and methods herein disclosed only some of which will be mentioned in 
the conclusion to this applications specification. For purposes of teaching the novel aspects of die 
invention, the exanq>le of a sport to be automatically broadcast is lhat of an ice-hockey game. 
Accordingly, the underlying objects and advantages of the present invention are to provide sub-systems in 
stq^rt of^ and conqprising an Automatic Event Videoing, Tracking and Content Operation System witii 
the following capabilities: 

1. tracking ofGdal game start / stop times, calls and scoring through: 

o the use of a referees whistie capable of transmitting a uniquely encoded identification signal \:^n 
die detection of airflow; 

o the use of a band to be worn over the fingers that is capable of transmitting a uniquely encoded 
identification signal upon the sensing of pressure when tiie game object, such as apuck, is either 
picked up or released, and 

o the inter&cing of the of&cial game scoring data collection device that is typically used to control 
the scoreboard. 

2. automatically tracking particq>ant and game object nK>vement using a multiplicity of substantially 
overhead viewing cameras: 

o by first detecting arid foUowing the partidpant and gaine object shs^firom a substantial 
overhead, fixed camera matrix capable of both tracking and filming, and: 
■ synchronizing these tracking and filming cameras to the power cycles of die vemie lig^tir^ 

system in order to ensure maximum, consistent image-to-image lighting; 
" where the fixed overhead filming cameras first capture an image of the background known 
to be absent of foreground objects, tiie backgroimd im^ of which can then be used during 
game filming to support the real-time extraction of any participants and game objects 
(collectively referred to as foreground objects) that may be traversing the background so 
that they may be efScientiy analyzed; 
• \^ere die fixed overhead cameras stream tiieir data into image extracting hubs whose 
purpose is at least to perform this extraction of the foreground fiiom die background, 
also referred to as segmentation, in real-time prior to multiplexing the resulting 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-20- 

extracted foreground objects into a single minimal stream to be passed on to an 
analysis computer^ 

o so that the larger stream of video data emanating fiom the multiplicity of 

oveibead cameras can be reduced in total pixel area to a volume of data capable of 
being received and processed by a typical con^uter system; 

• where a multiplicity of image extracting hubs stream their data into multiplexing hubs 
whose purpose is to join together the incoming streams of extracted foreground 
objects into a single stream for presentation to another multiplexing hub or an analysis 
computer; 

o so that tiie analysis conq)uter is capable of receiving the total multiplicity of 
streams as a reduced number of streams acceptable into its typical number of 
iiq>ut paths; 

■ where the tracking information determined for these foreground objects at least includes the 
continuoiis location and orientation of each participant and game object while they are 
within the field of play; 

■ using markings such as uniquely encoded helmet stickers in order to identify individual 
participants coincident with tiie tracking of tiieir sh^s; 

" using non-visible coatings to mark selected body points on each participant and by 

directing the reflected non-visible fiequencies entering fbe oveiiiead filming cameras to a 
separate sensor; 

• analyzing these coincident non-visible images to identify and track specific body 
points on each participant, and 

■ creating a gri d of overhead cameras whose views overls^ so a to collectively form a single 
view of the tracking surfece belovv^ 

• where the area covered by the overlap between any adjacent cameras is enough to 
ensure that any foreground object tibat tiansverses the junction remains within all 
views for a minimal distance; 

o where this minimal distance at least includes the size of any player identification 

maiks such as a befanet sticker, 
o where this minimal distance preferably includes enough area to keep a single 

participant in view while standing; 

■ creating an overhead matrix conqmsing at least two overhead grids, of&et to each other, 
such that any foreground object is always in view of at least two cameras, one fiom each of 
the two grids, at all times; 

• so that image analysis of these foreground objects &om the two separate views can 
create three dimensional tracking information; 



wo 2005/0^423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-21- 

■ preferably adding a Ihird oveifaead grid to the oveihead matrix such fliat any foreground 
object remains in the view of at least three cameras, one fiomeach of the three grids, at aU 
times; 

• so that more than one camera must malfunction before a foreground object is no 
longer seen by two cameras, and 

• so that composite images created of the foreground objects may have miTiimal 
distortion by always selecting the one view from any of the three viewing cameras that 
is die most centered; 

o by using the tracking location and orientation information concerning each participant to 

automatically direct a plurality of ID filming cameras afGxed from a perspective view throughout 
the venue to controllably c^ture images of selected participants including identifying portions of 
their uniforms such as their jersey numbers; 

■ to use the captured images of a selected participant's imiform, preferably including their 
jersey number, to conq>are and pattern match against a pre-known database thereby 
allowing for participant identification without necessitating the use of an added marViTig 
such as ahelmet sticker, and 

o by using a wireless handheld device to allow coaches to indicate, in real-time, game moments for 
review, where these moments are stored as time markers and cross indexed to both the indicating 
coach and the plurality of tracked data and collected film. 
3. automatically assembling a single conq>osite overhead view of the game based xspon the video images 
c^tured by the tracking system: 

o where an automatic video content assembly and con:q}ression conq>uter system ultimately sorts 
and combines Ihe video information of the extracted foreground objects contained in all of the 
incoming streams being received &om one or more multiplexing hubs, diemsel ves receiving from 
otiher multiplexing hubs or extractions hubs, themselves receiving fit>m all cameras within all the 
overhead grids comprising the overhead matrix; 

■ where any foreground object detennined to have been touching one or inore edges of its 
capturing camera's view, is first combined with any extracted foreground objects 6om 
adjacent cameras within the same overhead grid that are overlying along one or more 
equivalent physical pixel locations, 

• so ttiat a multiplicity of contiguous foreground objects, from a single overhead grid, 
are first constructed from the pieces captured by adjacent cameras within dmt grid; 

■ where each constructed or otherwise already contiguous foreground object c^tured within 
a single grid is then compared to the foreground objects, detennined to be occupying the 
same physical space, tiiat were captured from the one or preferably two other overhead 
grids; 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-22- 

• where the result of the coix^arison is to select the one view of the foieground object 
that contains the least image distortion; 

■ ^ere each minimally distorted contiguous foreground object may comprise one or more 
participants; 

• where these foreground objects may be determined to contain more than one 
participant by detecting the presence of more than one helmet sticker or other 
identifying mark, or 

• where the total pixel mass of &e contiguous foreground object is determined to be diat 
reasonably expect^ for a given number of participants greater than one; 

■ where contiguous foreground objects determined to comprise more than one participant are 
then preferably broken into separate smaller foieground objects centered about the best 
estimated location of each detected partic^>ant; 

• where the separate smaller objects are thought to contain only a single participant and 
are indexed at least according to the identity of that partic^ant, and 

• where it is immaterial that body portions of one participant axe included in the 
separated smaller objects of an adjoining participant, if at least the total video 
information contained in the forcibly separated smaller objects equals the total video 
information of the original contiguous (larger) foreground object 

■ so that a single collection of the least distorted views of all participants, broken iq> and 
indexed by participant and game objects as best as is possible, is created with minimal 
delay from real-time for each beat of image c^ture across all cameras in the overhead 
matrix; 

• where the expected beats of image capture might be every 1/30*, 1/60* or 120* of a 
second and fester, 

• where the same separate participant or game object images are then sorted into distinct 
streams within the time (or ten:gM>ral) domain as each successive beat of the capturing 
cameras creates an additional single collection of least distorted views, and 

• where any unidentifiable objects from a single collection form their own distinct 
tetnporal stream with any other unidentifiable objects, determined to overly the same 
physical local, from the next single collectiorL 

4. collecting video from one or more perspective view cameras that are automatically directed to follow the 
game action based upon the determined particq>ant and game object movement 
o by using tiie tracking location and orientation information concerning each participant and the 
game object to automatically direct a plurality of game filming cameras afBxed from distinct 
perspective views throughout the venue; 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



X where the pan / tilt and zoom settings of each perspective filming camera are automatically 
controlled and the capturing of images is restricted to distinct combinations of tibese 
settings rather than a particular fixed time beat such as 1/30* or 1/60*^ of a second; 

■ where for each possible distinct combination of pan / tilt and zoom settings, an image is 
first captured when the venue background is known to be absent of foreground objects, the 
background image of which can tiien be used during game filming to support the real-time 
extraction of foreground objects as they traverse the background thereby supporting image 
conq>ression; 

• where the total collection of background images for a given perspective camera, 
covering all possible distinct combinations of pan / tilt and zoom (P/T/Z) settings, are 
additional combined to form a single larger backgroxmd panoramic; 

o where this panoramic can be queried based upon the current P/T^ settings of the 
associated filming cameras in order to extract the equivalent original venue 
background overlapping die current image; 

■ where the extracted foreground objects torn each current fiame of each perspective filmin g 
camera are broken into separate streams by participant in a manner similar to tiiat taught for 
tiae overhead filming system, based upon tracking infoimation determined by the overhead 
system; 

■ where a table of pre-known color tones are established for all participant skin complexions 
as well as home and away uniforms, such that each pixel in the extracted foreground 
images can be encoded to represent one of these color tones less a gmyscale overlay 
thereby increasing image conqn:ession; 

■ using non-visible coatings to mark selected body points on each participant and directing 
die reflected non-visible fiequencies entering the perspective filming cameras to a separate 
sensoi^ 

• analyzing these coincident non-visible images to identify and track specific body 
points on each participant; 

o by using transponders to track the location and orientation of one or more roving, marmally 

operated filnung cameras so as to align its captured film with the determined location and 

orientation of the participants and game objects, and 
o by using transponders to track the location of selected spectators and to controllably direct 

spectator filming cameras based iq;>on the determined game actions of the participants and tiieir 

relationship to die tracked spectators. 
5. automatically collecting g^me audio and creating matched volume and tonal mappings; 

o by using audio recorders placed throughout the venue to capture a three-dimensional soundscape 

of the game that is stored botii in traditional audio formats, and 
o by san^)ling the traditional audio recording in order to create con:q)rBSsed volume and tonal maps 

diat may be used to drive a synthesized rendering of crowd noise . 



wo 2005/099423 



GA 02563478 2006-10-16 



PCT/US2005/013132 



-24- 

6. analyzing participant and game object movement to create game statistics and performance 
measurements forming a stream of game metrics: 

o where the continimm of tracked locations, orientations and identities of the participants and the 
game object is interpreted as a series of distinct and overk^ing events, where eadi event is 
categorized and associated at least by time sequence with the tracking and filming databases; 

■ where any given overhead or perspective filming camera may be operated at some multiple 
of the standard motion frame rate, typically 30i^ in ord^ to capture enough video to 
support slow and super-slow motion playback, and 

• where the oiticality of a given event determined to be in view of a given filming 
camera is used to automatically determine if tibiese extra nmltiple of video fi:ames 
should be kept or discarded; 

■ by using these interpreted events to automatically acciunulate basic game statistics; 

o including the cs^turing of subjective assessments of participant performance, typically fipom the 
coaching staff after the game has conq>leted, where tiie assessments of which are conq^arable to 
those made objectively based upon die automatically interpreted events and statistics, th^by 
forming a feedback loop provided to both the subjective and objective analysis sources in order to 
help refine tiieir re^>ective assessment methods. 

7. automatically creating performance descriptor tokens based upon die game metrics describing the 
inqportant game activities: 

o by creating a three-dimensional venue model diat calibrates the tracking and filming cameras into 
a single locd coordinate system, fix)m wMch the inteipreted events can be translated in * 
combination with predefined game rules into at least the recording of game scoring and o&er 
traditional statistics, and 

o by using participant and game object movements as cahlxrated to the playing venue along witii the 
interpreted events, scoring and other statistics to generate a continuous output of descriptive 
tokens that themselves can be used as input into a text-to-speech synthesis module for the 
automatic creation of game commentary. 

8. dynamically assembling combinations of the video, game metrics, performance tokens and audio 
information into an encoded broadcast based upon remote viewer directives; 

o where Reassembled video stream may compose; 

■ the single composite overhead view of the game encoded as a traditional stream of current 
images; 

■ one or more perspective views of die game encoded as a traditional stream of current 
images; 

■ eitiier or both of the overhead and perspective views alternatively encoded as a derivative 
of the traditional streams of current images encoded as: 

• streams of extracted blocks minimally containing all of the relevant foreground 
objects; 



CA 02563478 2006-10-16 

WO 2005/099423 PCT/US2005/013132 

-25- 

o where the pan/ tilt aM zoom settings associated with each and 

cutrent stream, for each perspective view camera, are also transmitted; 

• "localized*" sub^treams of extracted blocks further sorted in the spatial domain based 
upon the identification of tiie player primarily imaged in the block; 

• "normalized" sub-streams of "localized" extracted blocks fiurther ejqpanded and 
rotated so as to minimize e^qpected player image motion within the ter£q)oral domain; 

• '^localized" and '^rmalized" sub-streams further separated into £ice and non-&ce 
regions; 

• separated non-i^ regions fiirther separated into color underlay and grayscale overlay 
images^ and 

• color underlay images encoded as color tone regions. 

• any of the derivative forms of the traditional streams alternately encoded as gradient 
images; 

■ the single con^site overhead view represented in a symbolic, rather than video or gradient 
format; 

o where the assembled metrics stream may conq)ose: 

■ an ongoing accumulation of performance measurements and analysis derived from llie 
continuous stream of participant and game object tracking information created via image 
analysis of tiie single composite overhead view; 

o where the asserhbled audio stream may conopose: 

■ the traditional ambient audio recordings of the venue surroundings, or, 

« compressed volume and tonal maps derived from the ambient audio recordings that 
may be used to direct the automatic generation of syntiiesized crowd noise; 

■ a stream of tokens encoding a description of die game activities that may be used to direct 
the automatic generation of synthesized game commentary; 

o by using tiiedetenninedgaine stop and re-start times alorig with the interpreted events to 
selectively alter the contents of flie video stream; 

■ where alternative perspective view angles may be added to the stream based upon the 
measured game activities in order to serve as replays; 

" where additional captured iixiages greater than the traditional 30 frames per second may be 
transmitted and then added to the prior transmitted original 30 frames per second in order to 
all for slow motion replays; 
o by receiving user profile and preferences along with direct interactive user feedback in order to 
change any portion of the video, metrics or audio streams. 
9. transmitting the broadcast and receiving back interactive viewer directives; 

o using current standards such as broadcast video for television and MPEG-4 or H264 for the 
Internet, or 



wo 2005/099423 



CA 02563478 2006-10-16 



PCTAJS2005/013132 



^26- 

o using variations of current standards desigaed to take advantage of the additional information 
created by the present application that siqjport higher levels of broadcast stream conqpression. 
10. decoding the transmitted broadcast into a stream of video and audio signals capable of being 
presented on the viewing device, where: 
o selected information is transmitted, or otherwise provided to the decoding system prior to 
receiving the transmitted broadcast including: 

■ a 3-D model of the venue in which the contest is being played; 

■ a database of "natural" badcground images, one image for each allowed pan / tilt and zoom 
setting for each perspective view camera; 

• a panoramic background for each perspective view camera representing a coiiq>ressed 
conq>iktion of the database of '^tural" background images; 

■ a database of advertisement images m^ped to the 3-D venue model; 
* a color tone table represeotihg the limited nuihber of possible skin to^ 

equipment colors to be used when decoding the video stream; 
' a database of standard poses of &e partic^ants e3q>ected to play in the broadcasted game 
cross-indexed at least by participant identification and also by pose information including 
orientation and aqpproximately body pose; 

• where the standard poses for each participant are pre-c£^tured in the same uniforms 
and equipment they are expected to be wearing and using during the broadcasted 
contest 

" a database of translation rules controlling how the stream of tonal and volume map 
information is to be converted into synthesized crowd noise; 

■ a database of translation rules controlling how the stream of tokens encoding the game 
activities are to be converted into text for subsequent translation &am text-to-speech; 

o selected information is accepted locally, on the decoding system, for use in directing what 
information is included in the broadcast and how this information is presented, such as: 

■ aviewerprofile and preferences database that is established prior to the broadcast and 
includes information such as: 

• the viewers name^ age, address, relationsh^) to &e event as well as other traditional 
demographic data; 

• Ibe viewers preferences, at least inchiding indicators for: 

o using natural or animated backgrounds; 

o using the background fiom the actual or a substitute fedlity; 

o using natural or synliiesized crowd noise; 

o the voices to used for the synthesized audio game connnentary, and 
o the style of presentation. 



CA 02563478 2006-10-16 

WO 2005/099423 PCT/US2005/013132 

-27" 

■ the same viewer profile and preferences database that is amended before and during die 
broadcast in include viewer indications of: 

• the distinct ovediead and perspective views to be transmitted; 

• the format of the transmitted oveihead stream such as natural, gradient or symibotic; 

• the format of each of the transmitted perspective streams such as natural or gradient; 

• the detail of tbe metrics stream; 

• the inclusion of the performance tokens necessary to automate the synthesized game 
commentary, and 

• the format of the audio stream siich as natural or synthesized (and therefore based 
upon the volume and tonal m^). 

o selected portions of the transmitted broadcast are saved off into a historical database for use in &e 
present and future similar tmiadcasts, the information including: 

■ a database of c^tured game poses of the participants playing in the broadcast event stored 
and ax>ss-indexed at least by participant identUBcation and also by pose information 
including orientation and approximately body pose; 

■ a database of accumulated performance information concerning the teams and participants 
of the current broadcast, and 

" a database of the automatically chosen translations of descriptive tokens used to drive the 
synthesized game commentary, 
o decoding is based Mpon current standards such as broadcast video for television and MPEG-4 or 
H.264 for the Internet, including additional optional steps for. 

■ recreating natural and / or anbnated backgrounds; 

■ overlaying advertisements onto the recreated background; 

■ overlaying graphics of game performance statistics, measurements and analysis onto the 
recreated background; 

• where the above steps of recreating the background and overiaying advertisements and 
other gr^hics are based primarily tipon information including: 

o the three-dimensional venue layout, 

o die relative location ofthe associated perspective filming camera, 
o the transmitted pan /tilt and zoom settings for each current image, and 
o the information available in the viewer preferences and profile dataset; 
" translating the decoded pixels of the foreground participants via tiie pre-known color tone 

table into true color representations to be mixed with the separately decoded grayscale 

overlay informatioi^ 

■ overlaying the decoded extracted blocks of foreground participants and game objects onto 
the recreated backgrovmd based upon the transmitted relative location, orientation and / or 
rotation of the extracted blocks; 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-28- 

■ adding the actual veaue lecoidings or creating synthesized crowd noise based t^n the 
transmitted volume and tonal maps, 

■ creating synthesized game conunentary based upon the transmitted game descriptive tokens 
derived from tbe interpretation of tracking data, and 

■ inserting advertisement video / audio clips interwoven with the transmitted game activities 
based upon the tracked and determined game stop and re-start times. 

Many of the above stated objects and advantages are directed towards subsystems that have novel and 
knpoTtsait uses outside of the scope of an Automatic Bvent Videoing, Tracking and Content Generation 
System, as will be understood by those skilled in the art Furthermore, the present invention provides many 
novel and inq>ortant teachings tibat are useful, but not mandatory, for the establishment of an Automatic 
Event Videoing, Tracking and Content Generation System As will be understood by a careful reading of 
tihe present and referenced q}plications, any automatic event videoing, tracking and content generation 
system does necessarily need to include all of the teachings of the present inventors but preferably includes 
at least &ose portions in combinations claimed in this and any subsequent related divisional or continued 
{plications. Still ftnrtfaer objects and advantages of the present invention will become apparent from a 
consideration of &e drawings and ensuing descriptions. 
Disclosure of Invention 

RefOTing to Fig. 1, the Automatic Event Videoing, Tracking and Content Generation System 1 conqmses 
seven sub-systems as follows: 

1- A tracking system 100 that first creates a tracking database 101 and overhead image database 102; 

2- An automatic game filming system 200 that inputs data horn the tracking database 101, maint a in s 
the current pan / tilt orientation and zoom depth of ail automatic cameras in center-of-view 
database 201 and collects film database 202; 

3- An inter&ce to manual game filming 300 that Tnaint amg the curreat location, pan / tilt orientation 
and zoom depth of all manual filmiTig cameras in camera location & orientation database 301 and 
collects film database 302; 

4- An automatic spectator tracking & filming system 400 that maintains the current location of aU 
tracked spectators in spectator tracking database 401 and then coQects a spectator AA^ (audio / 
video) database 402; 

5- A player & referee identification system 500 that uses image recognition of jersey numbers to 
)q)date the tracking database 101; 

6- A game clock and ofBcial scoring inter&ce system 600 that iipdates the tracking database with 
clock start and stop time infomiation, 

7- A performance measurement & analysis system 700 that inputs data from tracking database 101 
and creates performance analysis database 701 and performance descriptors database 702, 

8- An inter&ce to performance conunentators 800 that collects V/A (video/audio) mfonnation firom 
Hve coromentators for storage in commentator V/A (video/audio) database 801 and inpnts 
information from performance analysis database 701 and performance descriptors database 702 



wo 2005/099423 



GA 02563478 2006-10-16 



PCT/US2005/013132 



-29- 

fiom vMch it generates automated commentator descrq>tors 802, as would be used with a speech 
synthesis system, 

9- An automatic content assembly & compression system 900 that receives ixxpnt fiom every 
database created by systems 100 througji 800 in addition to three-dimensional venue model 
database 901 and three-dimensional ad model database 902 and then selectively and conditionally 
creates a blended audio / video output stream that is con^ressed and stored as encoded broadcast 
904. Broadcast 904 is then optionally transmitted either over local or remote network links to a 
receiving computer system running broadcast decoder 950 that outputs automatic sports broadcast 
1000. 

Note that the tracking system 100, as well as aspects of the automatic game fihning system 200 and the 
performance measurement & analysis system 700, is based upon earlier plications of the present 
inventors of which the present invention is a continuation-in-part Those preceding q^lications are herein 
incorporated by reference and include: 

1- Multiple Object Trackmg System, filed Nov. 20, 1998, now U.S. Patent 6,567,1 16 Bl; 

2- Method for Representing Real-Time Motion, filed — ; 

3- Optimizations for Live-Event, Real-Time, 3-D Object Tracking, filed — . 

The present specification is directed towards tiie additional teachings of the present inventors that 
incorporate and build upon these referenced applications. For the purpose of clarity, only those 
descriptions of the tracking system 100, Ihe automatic game Filming system 200 and the performance 
measurement & analysis system 700 that are necessary and sufGcient for specifying the present automatic 
sports broadcast system 1 are herein repeated. As with these prior references, the present invention 
provides its exaniiples using a description of ice hockey although the teachings included in this and prior 
specifications are ^plicable to sports in general and to many other applications beyond sports. Tliese other 
potential plications will be discussed fijrther in the Conclusion to this Specification. 
Referring next to Fig. 2, there is shown tracking system 100 first conqnising multiple cameras 25, each 
enclosed within case 21, forming fixed overhead camera assembly 20c and mounted to tibie ceiling aboye 
ice surfece 2, such that they cover a unique but slightly overlq^ping section of sur&ce 2 as depicted by 
camera field-of-view 20v. Images captured by each individual overiiead camera assembly 20c are received 
by image analysis computer 100c that then creates a tracking database 101 of 2-D player and puck 
movement; fee methods of which will be described in the ensuing paragraphs. Tracking conq)uter 100c 
also receives continuous images fiom perspective view camera assemblies 30c that allow tracking database 
101 to fijrther include **Z" height information, thereby creating a ftiree-dimensional tracking dataset The 
automatic game filmin g system 200 then inputs player, referee and puck continuous location information 
from tracking database 101 in order to automatically direct one or more filming camera assemblies 40c, 
Assemblies 40c capture the action created by one or more players 10 with puck 3 for storage in automatic 
game film database 202. Note that combined fields-of-view 20v of the mult^le oveihead cameras 
assemblies 20c are ideally large enougji to cover player bench areas 2f and 2g as well as penalty box area 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-30- 

2h and entrance / exit 2e. In this way, players and referees are constantly tracked tbrougjiout the entire 
duration of the game even if they are not in the field-of-play or if there is a stoppage of time. 
Referring next to Fig, 3, there is shorn an alternate depiction of the same concepts illustrated in Fig. 2, As 
can be seen, tracking system 100 first conqmses a matrix of camera assemblies 20cm forming a regular 
and con^lete grid over tracking surface 2 as well as the immediate suiroimding entrance / exit, player rest 
areas 2f and 2g and penalty area 2h. Each assembly 20c is so aligned next to its neigjhbors such that its 
field-of-view 20v overlaps by ideally at least an amount suflBciently greater than the TnaYimntn size of 
hehnet sticks 9a on player 10. In this way, sticker 9a will constantly be visible within at least one field-of- 
view 20v. As players such as 1 0 proceed from entrance / exit 2 e onto tracking surface 2 and ultimately into 
and out of rest areas 2f and 2g and penalty area 2h their constant location is tracked by image analysis 
con^uter 100c. Hie constant location of referees, tiie puck and other movable game equipment such as 
sticks in the case of ice hockey are also tracked and recorded by analysis conq>uter 100c. This tracking 
information is communicated via network in real-time to automatic game filming system 200 that controls 
a multq>licity of filming camera assemblies 40c placed throug^ut the player venue. 
It shoiild be noted that the oveihead and perspective film gathered by system 100 via first overhead camem 
assemblies 20c and second perspective camera assemblies 30c are time synchronized with the film 
gathered by automatic filming camera assemblies 40c. As will be taugjit in the present invention, at least 
tracking camera assemblies 20c and 30c, and preferably including filming assemblies 40c, receive then- 
power signals in coordination with the ligjiting system used in the trackmg venue. As will be shown in 
discussion of Fig.'s 5b and 5c, this allows the images c^tured by these camera assemblies 20c, 30c and 
40c to be synchronized to ti^ "on** cycles of the alternating current that drives the lighting system, thus 
ensuring maxinunn image brightness and consistency of brigjitness across multiple images. In this case, all 
of the cameras are controlled to be "power^ synchronized to an even multiple of the alternating frequency 
of the power lines. This frequency will not exactly match the existing frequency of state of the art filmiT^g 
cameras that is built around the television broadcast NTSC standard, that is 29.97 frames per second As 
will be fiulher taught e^}ecially in discussion of Fig. 11a, there is significant advantage to fiirther 
controlling the shutter of the filming camera assemblies 40c to be additionally synchronized to a finite set 
of allowed pan and tilt angles as well as zoom dq)ths. This subsequent "motion** synchronization is then 
ideally merged with the **power** synchronization forming a "motion-power^ synchronization for at least 
filming assemblies 40c, but also ideally for perspective camera assemblies 30c. The anticipated shutter 
frequency of tbe "motion-power^' synchronized assemblies 30c and 40c may not be regular in interval, and 
may not match the shutter frequency of the "power** only synchronized overhead ass^nblies 20c. In this 
case, the sequence of images streaming from the "motion-power" synchronized assemblies 30c and 40c, 
diat are potentially asynchronous in time, will be assigned die time frame equivalent to either tiie prior or 
next closest image in time captured by the overhead assemblies 20c, that are synchronous in time. In this 
way, aU film gathered by tracldiig system 100 and automatic gaine filming system 200 win ^ 
synchronized driven by the "time-beat* ' or frequency of the power lines . It is e>Epected that any differences 
between the actual time an image was captured from either an assembly 30c or 40c, and its resulting 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-31- 

assigned time frame, will be minimal and for all intensive purposes undetectable to the hmnan viewer. 
Hence, when a viewer is stopping and starting their review of game fihn taken from either the overhead 
assemblies 20c or the perspective assembhes 30c and 40c, they can switch between any of these multiple 
views with the perception that they are viewing the same exact instances in actual time, even though they 
may not be. 

Referring next to Fig. 4a, 4b and 4c, there is shown a sequence of illustrations describing the overall 
technique for determming the X-Y locations of all foreground objects via image analysis by conq)uter 
lOOc, while simultaneously acconq)lishing extraction of flie moving foreground image fiom the fixed 
background. First, in Fig. 4a there is depicted player 10, wearing hehnet 9 onto which is attached sticker 
9a and holding stick 4 near puck 3; all of which are in view of overhead assembly 20c. Assembly 20c 
cq)tures and transmits its continuous images to trackmg analysis computer 100c that ultimately determines 
the location and therefore outlines of foreground objects such as player 10, hehnet sticker 9a, stick 4 and 
puck 3. Subsequent Fig.'s 5a toou^ lOh will fur&er teach the qjparatus and methods illustrated in Fig,'s 
4a, 4b and 4c, In Fig. 4b, there is shown cunent image 1 Oc taken by assembly 20c and subtracted fiom 
pre-stored background image 2r. As will be taught, this and subsequent method steps will yield extracted 
foreground objects such as lOel, 10e2, 10e3 and 10e4 as depicted in Fig. 4c. In this case, foreground 
objects lOel through 10e4 are the consecutive extractions of player 10. Within each extraction, trackmg 
analysis con^)uter lOOc additionally determines the presence of hehnet sticker 9a. Once found, the centroid 
of sticker 9a is calculated and for instance, mapped to the center 2c of the traddng surfece 2. This location 
m^ing can be described in polar coordinates as angle lOela and distance lOelr. Similariy, the location 
of puck 3 is tracked and mapped, for instance as angle 3ela and distance 3elr. 

It should be noted that the actual local coordinate system used to encode object movement in optional. The 
present inventors prefer a polar coordinate system focused around the center of the tracking surfece. 
However, other systems are possible including an X, Y location method focused on the designated X and 
Y, or *^rth-south / east-west" axis of the tracking surfece. This X, Y mediod will be referred to in the 
remamder of the present application, as it is simpler to present than the polar coordinates method, hi either 
case, what is in^rtant is that by storing the continuous locations matched to exact times of various 
objects, the tracking system 100 can relay this information in real-time across a network for mstance, to 
botii the automatic game film i n g system 200 as well as ttie performance measurement & analysis system 
700. System 700 is then able to calculate many useful measurements begmning with object accelerations 
and velocities and leading to complex object interrelationships. For instance, player 10, cq>tuTed as 10e4, 
is determined to have shot puck 3, captured as 3e4, at hockey goal 2h. This shot by player 10 is then 
recorded as a distinct event witii a distinct begmning and ending time. Fur&er derivations of information 
include, for exanq)le, the shooting triangle 2t formed by the detected and located end of stick 4, captured 
as 4e4 and the posts of goal 2h. Such and similar "contenr measurements, while toudied upon in the 
present invention, will be the focus of an \q>coming £^lication firom the present mventors. 
Refenring next to Fig. 5a, there is shown flie preferred embodiment the matrix of overhead camera 
assemblies 20cm (depicted in Fig. 3) con5)rising one or more overhead cameras assembly groups, such as 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-32- 

20g-l and 20g-2. Each group such as 20g-l, flirflier conq>rises individual assemblies such as 20c-l, 20c-2, 
20C.3 and 20c-4. Multiple assemblies such as 20c-l fluou^ 20c-4, conqjiising a single group, such as 
20g-l, each stream their captured images 10c to a dedicated image extraction hub, such as 26-1. 
Subsequently, one or more extraction hubs, such as 26-1 or 26-2, stream their extracted foreground images, 
such as lOe-l and 10e-2&3, and their corresponding symbolic r^siesentations, such as lOy-1 and 10y-2&3, 
to a multiplexing hub 28 that multiplexes these streams. One or more multiplexing hubs, such as 28, then 
pass their extracted image streams, such as lOes, to automatic content assembly & compression system 900 
for processing. Hubs, such as 28, also pass their corresponding symboHc representation streams, such as 
lOys to tracking analysis computer 100c. 

Overhead camera assemblies such as 20c-l, further comprise lens 25a that focuses Ugjit from the scene in 
field-of-view 20v onto image sensor 25b. Sensor 25b is preferably a CMOS digital imager as is commonly 
available from such suppliers as National Semiconductor, Texas histruments or Micron. Such imagers are 
readily available at different pixel resohitions and different frame rates in addition to monochrome versus 
color. The present inventors prefer using sensors from a coirg>any kno>vn as the Fill Factory who supphes a 
monochrome sensor with part number IBIS5A-1300-M2 that can process 630 x 630 pixels at 60 frames per 
second There equivalent color sensor part number is IBIS5A-1300-C. Image sensors 25b are controlled by 
a programmable processing element such as FPGA 25c. Processing element 25c retrieves cjq)tured images 
10c from sensor 25b in timed synchronization wife the "on" cycle of the power lines as they drive the 
surrounding Ughting system (as will be fiirther described along with Fig.'s 5b and 5c.) Processing element 
25c of assembly 20c-l for exanople, then outputs images 10c across link 27 to the mpni circuitry of image 
extraction hub 2^1. Various input / ou^ut protocols are available such as USB or Fire-wire and should be 
chosen based Mpon the frame rate of sensor 25b and the distance between processing element 25c and ii^ut 
circuitry to hub 26-1, among other considerations. Processirig element 26a is preferably a Digital Signal 
Processor (DSP) that is capable of executing many conqjlex mathematical transformations on images 10c 
at higji speeds. Element 26a receives input of one or more image streams lOc from one or more overhead 
camera assemblies, such as 20c-l, 20c-2, 20c-3 and 20c-4, depending primarily upon its processing 
capabilities and the data input rate. Note that a single hub, such as 26-1, is capable of essentially merging 
tiie multiple fields-of-view of the mdividual camera assemblies, such as 20c-l tiirougji 20c-4, into a single 
combined view 20w as seen by overhead tracking camera grid 20g-l. Hence, the present inventors are 
teaching an ^aratus that co-joins multiple image sensors into a single larger virtual sensor with a 
proportionately increased pixel resolution and field-of-view, 

frrespective of how many individual cameras, such as 20c-l, and individual processing element 26a m a 
hub, such as 26-1, can simultaneously combine, (e.g, Aether one, four or eight cameras,) flie overall 
design remains identical and therefore scalable. For each incoming image 10c, from each inputting camera 
20c-l, element 26a first retrieves background image 2r from hub memory 26c to be mathematically 
conqjared to yield resulting foreground object blodc, e.g. lOe-1. (The metiiod preferred by the present 
inventors for this process of foreground extraction is discussed in more detail during the upcoming 
discussion of Fig. 6a.) Once foreground images, such as lOe-1, have been extracted, Ihey will ideally 



wo 2005/099423 



CA 02563478 2006-10-16 



PCTAJS2005/013132 



-33^ 

conq)rise only flie portions of image 10c fliat are necessary to fully contain the pixels associated with 
foreground objects such as player 10, helmet sticker 9a or puck 3. Sequential processing element 26b. such 
as a microprocessor or FPGA, then examines these extracted regions, such as lOe-1, in order to locate any 
hehnet stickers 9a and subsequently identify a captured player, such as 10. Element 26b also creates a 
symbolic representation, such as lOy-1, associated witii each extracted frame, such as lOe-1. This 
representation includes information sach as: 

• The total foreground pixels detected in the extracted block 

• The total number of potential pucks located m the extracted block 

• Fore each potential puck detected: 

• TheX, Ycentroidof tfaepuck 

• The total nimiberofhehnet stickers detected in tiie extracted block 

• For each hehnet sticker detected: 

• Hie X, Y centroid of the identified helmet sticker 

• The numeric value encoded by the helmet sticker 

• The direction in which the helmet sticker is oriented 

• If only a single hehnet sticker is detected and the number of foreground pixels counted is within tiie 
range e?q>ected for a single player, 

• then a elHpticalsh^ best fitting the foreground pixels surrounding or near the detected hehnet 
sticker 

• ttie vectors best representing any detftctftd Qhapft tnafrhing t^gf pntinpat^^d for a player's stick 

• If more than one helmet sticker is detected, or if the number of foreground pixels counted indicates 
that more than a single player is pxesent in the current extracted block, then: 

• The block is automatically split up along boundaries lines equidistant between detected helmet 
stickers or determined foreground pixel "weighted centers," where: 

• Each weighted center uses calculating steps such as X, Y histograms to determine die center 
locations of any preponderance of foreground pixels 
After determining extracted blocks such as lOe-1 and their correspondrng symbolic representations, such 
as lOy-1, hubs, such as 26-1, output this stream to multq)lexing hubs, such as 28. As will be appreciated by 
those skilled in the art, multiplexing hub 28 effectively joins die nmltiple lower bandwidth streams from 
one or more extraction hubs, such as 26-1 and 26-2, into two higher bandwidth streams, lOes and lOys, for 
iiqiut into the next stage. The purpose for this multiplexing of streams is to reduce the number of irtput / 
output ports necessary on the computers associated with the next stages. Purthemore, the stream of 
extracted foreground images lOes represents a significantly smaller dataset than the sum total of all image 
fiiames 10c from all the cameras assemblies, such as 20c-l, that are required to create a single combined 
field-of-view large enougji to enconq[>ass the entire tracking surfitce 2 and its sunounding areas such as 2e, 
2f,2gand2h. 



wo 2005/0^423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-34- 

Referring next to Fig. 5b, metal halide Imsps are a typical type of lighting used to illuminate large areas 
such as an indoor hockey rink. These lamps use magnetic ballasts &at are directly coupled to the 60 Hz 
power lines running throughout the rink. In Fig. 5b, there is shown the 60 Hz wavefonn 25p of a typical 
power line. Ideally, all of flie lighting used to illuminate the tracking area is driven from the same power 
lines and is flierefore receiving the same waveform 25p. The metal halide lamps connected to these ballets 
will regularly discharge and re-ignite each half-cycle of the power line wavefomi 25p. Although 
undetectable to the naked eye, the Ugjiting in such a conlBguration is actually fluctuating on and ofiF 120 
times per second When image sensors such as 25b are being used to capture higji-«peed sports action, the 
shutter speed of assembly 20c is ideaUy set at 1/500* to 1/1000*** of a second or greater. At fliese speeds, it 
is necessary to synchronize the capturing of images off of the sensor 25b to the maximum discharging 
25md of energy through the lamps. Otherwise, the images will vary in ambient brightness causing 
degradation in image analysis performance. Although current state of the art industrial cameras do allow 
external control of their shutters, they are designed to capture images at the NTSC broadcast industry 
standard of 29.97 fiames per second. At this rate, the frequency of image captures will tend to drift through 
the out-of-synch on-ofif cycle of the hnsps thereby creating a pulsatiag dimmmg of Ihe resulting image 
steeam. The present invention uses the same power lines that drive the tracking surfece lighting to drive the 
filming cameras. First, the 60 Hz sinusoidal waveform 25p is converted into a 60 Hz square waveform 25s 
that is then used by processing element 25c to trigger the electronic shutter of assembly 20c at instances 
that correspond to flie determined maximum discharge 25md that itself corresponds to the peak of the 
smusoidal wave 25p. Fig. 5b shows the series 25dl, 25d2, 25d3 through 25d8 of instances along power 
hne waveform 25p when all of the connected laxaps are expected to discharge. Also dq)icted is the series 
of signals 25sl and 25s2 that are used by processing element 25c to controUably activate the electronic 
shutter of camera sensor 25b; thus acconq>lishing "power^ synchronization of all tracking camera 
assemblies such as 20c and 30c as well as filming cameras such as 40c with the venue lighting and 
eachother. The actual selection for firequency of signal 25s, programmed into processing elements such as 
25c, will be the ^ropriate sub-integral of the power-line fiequency offering the desired firame rate, e.g. 
30, 60, 90, 120 §>s that matches the image sensor's, such as 25b, fimctionality. 

As will be understood by those skilled in the art, assembUes such as 20c, that capture images at a rate of 30 
firaies per second, are operating fester than the NTSC standard. Hiercfore, by dropping an extra frame 
over a calculated time period they can be made to match the required broadcasting standard transmission 
rate. For instance, every second that assembly 20c operates at 30 tames per second, it would be creating 
30 - 29.97 = 0.03 more fi:ames than necessary. To accumulate one entire extra frame it will take 1 / 0.03 = 
33 1/3 seconds. Hence, after 33 1/3 seconds assembly 20c will have captured 33.333 * 30 = 1000 images. 
Over this same 33 1/3 seconds, the NTSC standard will have required die transmission of 33.333 * 29.97 ^ 
999. Assembly 20c will have created I more frame dian required by the NTSC standard v/inch can sinq)ly 
be dropped 

Refening next to Fig. 5c there is depicted the same waveform 25p that has been additionally clipped in 
order to remove a certain number of its power cycles. By so doing, venue lighting is effectively "dimmed'* 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-35- 

to flie spectator's and participant's perception. However, tracking system 100 continues to receive well lit 
images via assemblies 20c that remain syncluonized to the remaining "on" cycles of the additionally 
clipped waveform 25p, It should be noted that this technique can be used to synchronize camera 
assembhes such as 20c, 30c and 40c to area strobe Ughting thereby ensuring that inmges are captured only 
when the strobe is firing. 

Referring next to Fig. 6a, there is depicted the preferable steps of the method for extracting the foreground 
image £rom the current frame as follows: 

Step 1 involves c^tunng and storing an image of the background 2r prior to the introduction of 
foreground objects. For instance, an image of the ice sur£ice 2 prior to flie presence of any players 10 or 
puck 3. 

Step 2 involves the cq>turing of current images 10c by cameras assemblies such as 20c, 30c or 40c. For 
instance, as controlled by processing element 25c to c^ture images off of sensor 25b in camera assembly 
20C-1. 

Step 3 involves the mathematical subtraction of current image lOc fiom background image 2r yielding 
subtracted image 10s. The present inventing works witti either grayscale or color representations of the 
current image 10c, With grayscale, each pixel may for instance take on a vahxe of 0 = black to 256 = white. 
These grayscale values are directly available in the case \sdiere the image sensor 25b is monochrome and 
can be easily calculated in the case where image sensor 25b is color, as will be understood by those skilled 
in the art Once the image is acquired, the subtraction process is performed by processing element 26b 
yielding pixel by pixel difference values. Pkels that do not represent a foreground object will have 
minim a l to no subtracted dif^nce value fiom the corresponding background pixel. Element 26b then 
con^)ares this difference value of the subtracted pixels to a threshold, below which the given pixel in the 
subtracted image 10s is treated as identical to corresponding pixel in the background image 2r. If die pixel 
is determined to be within ttie threshold considered identical, Hbsn the corresponding pixel in the subtracted 
image 10s is set to 0, or blade, otherwise it is left alone. 

Step 3a involves the pixel by pixel examination of the resulting subtracted image 10s in order to determine 
the minimum rectangle> bounding box lOm-bb, required to enclose any contiguous foreground object 
Smce Step 3 essentially removes all background pixels by setting them to a 0 value, then the foregrourKi 
image is singly determined to be any pixel with a value greater than 0. Tlie present inventors prefer 
searching the image in regular progression such a row by row, top to bottono. However, as wiU be 
understood by those skilled in the art, odier methods are possible. For instance, the present system is 
ideally designed to have a minimum of two pixels on any given foregroimd object to be detected In 
practice, there may be three pixels per inch or hi^er resolution while the smallest expected foreground 
object for hockey would be the puck 3. The diameter of a regulation size puck 3 is roughly three inches 
while its thickness is roughly one inch. Hence, even while roUing perfectly on its edge, puck 3 will take up 
at least thee pixels along one axis and nine along the orthogonal axis. For this reason, the preferred regular 
progression is additionally modified to first; always cover die outer edges of the fiame in order to identify 
foreground objects that are overlapping with adjacent views 2()v, and, second; to skip every *1C rows and 



CA 02563478 2006-10-16 
WO 2005/099423 PCT/US2005/013132 



-36- 

"Y" columns. The parameters of **X" and "Y** are preferably dynamic and modifiable based upon the 
ongoing image analysis. For instance, at a tninimimij each parameter would be set to '*X" = **Y" = 2 pixels 
thereby directing the search to pick up the I''*, 4*^, A 10^ row and column respectively. This would reduce 
the total pbcels to be minimally searched to 33% ♦ 33% = 17% (approximately.) Under oflier 
circumstances, bo& parameters could be significantly inaeased since the next largest foreground object for 
ice hockey is a player's 10 glove. Such an object might take up a minimum of twenty by twenty pixels, 
thus allowing for '"X" = «Y** = 20. This increase of parameter could be set under the feedback condition 
that indicates that the puck is not expected to be within the view 20v of a given assembly 20c or, 
conversely, has now been found within view 20v. Furthermore, since most often a player 10 does not loose 
their glove, the practical minimal object will be the player 10 or their stick. Li these cases, the parameters 
of '"X" and "Y" can be greatly increased 

During this minimal search process, once a foreground pixel is found, it is registered as tiie upper and 
lower row as well as left and rigjit column of the newly identified object As the search proceeds to the 
next column on the right in the same horizontal row, if the next pixel is also found to be greater than 0, 
then that column now becomes the rightmost If the next pixel is found to equal 0, and to therefore be a 
part of the bacliground, tiien the preferred method returns backward by Vi **X" to check the pixel in 
between the last detected foreground pixel and the first detected background pixel. If this pixel is greater 
than 0, it becomes tile rigjitmost cohmm and **X" is reset tenqx)rarily to 

again to the rigjit If the pixel was found to be equal to 0 , then the method would again search backward by 
Vz oV/z "X". Of course, at anytime if the fi:action of "X" becomes less than 1 die search ends. A similar 
strategy follows fiom the original detected foreground pixel as the search is conducted downward to the 
next lowest row on the same column. However, for each additional pixel in lower rows of the same 
original column are determined to be greater than 0, the column by column variable searching must be 
conducted in both directions. Therefore, the method is followed to examine columns to the right and left 
It should be noted, that once the first foreground pixel is found, then fee search continues and becomes 
botii a search to bound the foreground object as well as a search extending out to potentially find new 
objects. In any case, ultimately one or more foreground objects will be found and an sgrproximate 
minimum bounding box lOm-bb will have been created by continually expanding the upper and lower 
rows as well as the left and right columns. After this proximate box is found, the present inventors prefer 
searching pixel by pixel along the iqjper, lower, left and ri^t edges of the box. As tfie search takes place, 
for each foreground pixel detected, ttie search will continue in the direction away fix)m the box's interior. 
In this fiishion, portions of the foreground object that are extending out of the original s^roximate box 
will be detected and therefore cause the box to grow in size. Ultimately, and by any acceptable steps, tihe 
minimum box will be identified in Step 3a. 

Step 4 mvolves examining small blocks of adjacent pkels from the subtracted unage 10s in order to 
determine their average grayscale value. Once determined, the average grayscale value of one block is 
compared to that of its nei^iboring blocks. If the difference is below a dynamically adjustable threshold 
value, then the conesponding pkel in the gradient image lOg is set to 0 (black); otherwise it is set to 256 



wo 2005/099423 



CA 02563478 2006-10-16 



PCTAJS2005/013132 



-37- 

Thus, wherever there is a laige enougii change in contrast within the subtracted image 1 Os, the 
pixels of the gradient image lOg are set to white fonning in efGsct a "line^wing" of fee foreground 
object Note that Step 3 may optionally be skipped in fevor of creating the gradient image lOg directly 
from fee current image 10c, feereby saving processing time. 

Step 4a involves finding the minimum bounding box 2r-Bb that fiiDy encloses fee "line-drawing'* created 
in Step 4. The upper edge of th^e bounding box lOm-bb is determined by finding fee '^highest row" in fee 
image feat contains at least one pkel of fee "line-drawing " Similarly, fee lower edge of fee bounding box 
lOm-bb represents fee "lowest row," while fee left and right edges of fee box represent fee "leftmosr and 
"rightmost columns" respectively. Fore this purpose, fee present inventors prefer enqjloying a mefeod 
exactly similar to that described in Step 3 a. As will be shown in fee following Step 5 , this 
bounding box lOm-bb is important as a means for removing fiom fee current image 10c a lesser amount of 
information containing a foreground object 

Step 5 involves using the calculated bounding box lOm-bb, regardless of \^feether it is based upon fee 
subtracted image lOs or fee gradient image lOg, to remove, or "cut-out" fi:om fee current image 10c a 
foreground block lOe. For each current image lOc being processed by element 26b of hub 26, fee above 
stated Steps may find anywhere from zero to many foreground blocks, such as lOe. It is possible that feere 
would be a single foreground block lOe that equaled fee same size as fee original image 10c. It is also 
possible that a single foreground block lOe contain more than one player. What is inqxortant is that fee 
images 10c, being simultaneously captured across fee multiplicity of camera assemblies, such as 20c-l, 
would form a combined database too large for processing by today*s technologies. And that this entire 
stream of data is being significantly reduced to only feose areas of fee surfece 2 where foreground objects 
lOe 0>layers, referees, equipment, fee puck, etc.) are found- 
Step 6 mvolves fee processing of each extracted foreground block lOe to fiirfeer set any and all of its 
detected background pixels to a predetermined value such as null feereby creating scrubbed block lOes. 
These pixels can be determined by comparing each pixel of block lOe to the background image 2r. similar 
to fee image subtraction of Step 3, Alternatively, fee image 10s could be examined within its 
corresponding bounding box lOm-bb. Any pixels of 10s already set to zero are background pixels and 
feerefore can be used to set fee corresponding pixel of extracted block lOe to fee null value. 
Step 7 involves the conversion of scrubbed block lOes into a corresponding synibolic representation lOy, 
as detailed above in fee discussion of Fig. 5a. The present inventors prefer a representation lOy that 
includes symbols for hehnet sticker 9a showing bofe its location and orientation, player lO's body and 
stick as well as puck 3. 

Step 8 involves taking fee remainder of the current image lOx, that has been determined to not contain any 
foreground objects, in order to "refresh" the background image 2r. Jn fee instances of sports, \^ere fee 
tracking surfiice 2 may be for example frozen ice or a grassy field, fee background itself may very slightly 
between successive current images 10c. This "evolving" of fee background image can lead to successive 
false indications of a foreground object pixels. This Step 8 of ^Vefireshing" includes copying the value of 
fee pbtel m fee remainder or "leftover" portion of image lOx directly back to fee corresponding pixel of fee 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-38- 

background image 2r. Hie preferred embodiment uses a second threshold to determine if the calculated 
difference between a pixel in the background image 2r and the current image 1 Ox is enou^ to warrant 
i^ating the background 2r. Also in the preferred embodiment, the background is updated with all pixels 
that are outside of the outermost edge of the "line-drawing" created in Step 4, rather than the bounding box 
2r-Bb created in Steps 3a or 4a. As can be seen in the depiction of Step 5, there are non-foreground, Le. 
background pixels that are enconqxissed by the minimum bounding box 2r-Bb. These pixels can also 
contribute to the •*refi:eshing'' step. 

As will be understood by those skilled in the art, there are great efficiencies to be gained by merging all of 
the logical steps into a pixel-by-pixel analysis. Hence, rather than going throu^ the entire image, pixel-by- 
pbcel, and performing Step 3 and then returning back to the first pixel to begin Step 3a, Step 4, Step 4a, 
etc.. Steps 1 through Step 8 can be performed in sequence on a single pixel or smaU group of pixels before 
proceeding on the next pixel or small groiq) to redo tiie same sequence of Steps. The present inventors 
prefer this ^proach because it supports the least amount of memory access versus processor register to 
register movement and calculatioa 

Referring next to Fig. 6b there is depicted the full-color upper portion lOfc of a player whose jersey, 
hehnet and face con5)rise, for exan^le, four base colors tones lOct It is typically the case in team sports, 
that the entire player and uniform would have a limited number of individual colors. For instance, color CI 
could be \^te, color C2 could be flesh, color C3 could be black and color C4 could be orange. In the case 
of color versus monochrome images, after all foreground objects such as lOe have been successfully 
extracted, then in Stqj 1 hub 26 wiU optionaUy further deconstruct object M the depicted 
exan^le, full-color upper portion lOfc is separated into base color image lObc in Step la and grayscale 
image lOfg in Step lb. This separation is conducted on a pixel-by-i)ixel basis as each pixel is conq)ared to 
the base color tone chart 1 Oct to find its nearest color. This con^aiison effectively determines the 
combination of base tone lOct, such as CI, C2, C3 and C4, and grayscale overlay that best accounts for the 
origfaial pbtel value. The grayscale overlays are simply shading values between the TninimnTn of 0, i.e. no 
shading, and the maximum of 256, i.e. full shading. 

This end result separation &om an original extracted foreground image such as 1 Ofc into its base color 
image lObc and grayscale image lOfg provide an additional significant advantage for image compressioa 
Traditional techniques typically rely upon a three byte encoding, for instance using one byte or 25 6 
variations per each main color of red, blue and green (RBG.) Hence, a 640 by 480 VGA resolution RGB 
image that includes 307,200 total pixels requires 921,600 bytes of storage to encode full color. The present 
invention*s solution for effectively separating moving foreground objects fiom still and moving 
backgrounds provides this subsequent opportunity to limit the total colors that must be encoded for any 
given foreground pixel to a set of pre-fcnown values. Hence, if the total base colors on both teams were less 
than sixteen, the max color encoding would be four bits or one-half byte per pixel as opposed to three bytes 
per pixel for RGB fiill color. Also, since studies have shown that the human eye has difficulty detecting 
more flian sixteen shades, the grayscale overlay image lOfg would require and additional four bits or one- 
half byte per pixel. The present method taugjit herein would then requke only Vi byte for the color tone as • 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-39- 

well as V4 byte for grayscale. The resulting 1 byte per pixel is just 1/3"* of the infoixnationused in a 
traditional RGB method. 

It should be enq>hasized that the present invention teaches the dropping all background pixels, or at least 
those outside of the tnininium bounding box 10ni4}b» such that tiie majority of pixels in any current image 
are potentially con:q)ressed by 100%. With respect to the rranaining foreground pixels that may be 
potentially conq)ressed by 66%, the present inventors prefer the creation of an additional color map 10cm 
(Step 2a) and grayscale m^ lOgm (Step 2b.) Note that outermost edge of each map, 10cm and lOgm, is 
identical to lOfc and represents the outline of the foreground image. By storing the inner edges belonging 
to each m^ 10cm and lOgm, it is possible to simply record a single color or grayscale value representing 
the color tone or grayscale value respectively, of each interior regiorL This method sho^ in Steps 2a and 
2b provides potential for further increasing tiie image conqiression by recording the outline of regions of 
the same pixel value without requiring the interior regions to be encoded. Note a tradeoffbetween the 
methods for encoding perimeters versus the method for minimally encodtng individual successive pixels. 
As long as flie perimeter encoding method requires the same aixu>unt of data per perimeter pixel as required 
to minimally encode a single pixel of an entire fiame, than flie perimeter ^proach will provide additional 
conq}ression opportunities. Note that upcoming Fig.'s 6f and 6g focus on two preferred perimeter encoding 
methods. 

The present inventors anticipate that the number of color tone regions needing to be encoded, as shown in 
10cm, is mostly dependent xxpcm the viewing angle of the image capture camera. Hence, the regions on a 
j ersey are fixed by design but will tend to break into smaller regions within the camem' s view as a given 
player bends and moves or is included by other players. However, tiie grayscale regions lOgm are more 
directly under tiie control of tiie chosen extraction method. Hence, more regions will tend to be formed as 
the allowed range of grayscale for any given region is lessened. This lessening of grayscale range will add 
to the final resultant pictures realism while adding to the overall bandwidth to encode and transmit the 
same information. The present inventors prefer an approach tiiat dynamically adjusts both flie levels of 
grayscale detected and the smallest regions allowed. Hence, by choosing to distinguish eigjit grayscales 
versus sixteen or thirty-two, in is anticipated that there will be fewer larger regions in the m^ lOgm. 
Again, these fewer regions will require less bytes to encode. Furthermore, adjacent regions determined to 
be of minim a l grayscale difference could be merged using the average grayscale as an additional technique 
for minimizing region encoding. 

Referring next to Fig. 6c, there is shown tiie same fiill color upper portion IQfc as depicted in Fig. 6b prior 
to being separated into base color image lObc and grayscale image lOfg. In this case, full color image lOfc 
is first separated into all &cial region lOcm-a (Step Ic) and fiill region with null-fiicial area lOcm-b (Stq> 
2a.) As will be presented especially in association with upcoming Fig.'s 11a tfarougji llf, tracking system 
100 provides detailed three-dimensional topological information that can be used to easily locate the area 
of any extracted foreground object that is e:q)ected to include a player's fece. For instance, when viewed 
fiom overhead assemblies such as 20c, flie player 10 depicted in perspective view fiiU color image lOfc 
includes a hehnet sticker 9a. Once detected by tracking system 100, sticker 9a provides the viewed 



wo 2005/099423 



CA 02563478 2006-10-16 



PCTAJS2005/013132 



-40- 

player's identity. This same identity can be used to index to a database of preset player body and head 
dimensions. Using such pre-stored player head dimensions as well as the detected (X, Y, Z) location of the 
helmet sticker, hubs 26 are able to quickly estimate the mftyiTmiTn area within an extracted fiill color image 
lOfc that should include the player's fiice. (Note that since the head size of most players will be relatively 
similar, the present inventors prefer using a preset global head size value for all players and particq>ants.) 
In addition to the measurement information leading to the location of &cial region 10cnv-a» the present 
inventors also note that the skin color of a participant will be in its own distinct color tone(s), such as C2 
shown in Fig. 6b. Hence, during extraction of full color image lOfc, hub 26 may also create a minimiitTri 
bounding box around any foreground areas where a know skin color tone, such as C2, is found. The 
present inventors prefer first usmg the topological information to locate a given player's expected fecial 
region and then examining ^ maximum estimated region to see if it contains facial color tones. The 
region can then be collapsed or expanded as needed based upon the results of this examination. In either 
case, after extracting fecial region lOcm-a from full color image lOfc, any foreground pixels determined to 
be of non-fecial color tones are set to null. Conversely, in the remaining full region with nuU-fecial area 
lOcm-b, all foreground pixels determined to be of facial color tones are set to null. 
It should be noted that the present inventors anticipate the use of the present invention in sports such as 
basketball were the players 10 do not wear helmets. As will be discussed with i9>coming Fig. 14, tracking 
system 100 has other methods for determining player identity Bpsai from the use of a helmet sticker 9a. In 
this alternate approach, the location of a player lO's head region will still be available via image analysis 
and hence will be able to siq^port the method taught in association with the present F^. 6c. In the case of a 
sport such as baskeAall, die present inventors prefer separating the bead region as shown into lOcm-a and 
representing the remaining potion of the player lO's body in full color region lOcm-b, even though it too 
will contain flesh tones, such as C2. 

Referring next to Fig. 6d, there is shown a stream lOes-cm of successive fecial region sub-frames lOcm-al 
tiuroug^ 10cm-a8 representing a given time sUce of captured player 10 activity. As discussed previously in 
reference to Fig. 6c, a given extracted block lOes such as full color upper portion lOfc is expected to 
contain a sub-region &at includes at least some of a participant's fece and hair. It is anticipated that this 
minimum area contaioing the facial region lOcm-a of aplayer 10 will change in size due most often to 
player 1 0 movement or zooming of the filming camera assembly. In either case, fbe net effect is fbe same 
and will cause "'zoomed-in" sub-frames such as 10cni-a4 or 10cm-a5 to be larger in terms of total pixels 
than ''zoomed-K}uf' sub-fi:ames such as lOcm-al or 10cnft-a8. As will be understood by those skilled in tiie 
art, in order to feciUtate fiame-to-frame conqiression, it is first desirable to align the centroids of each 
individual sub-fi^me, such as lOcm-al through 10cm-a8 along an axis lOcm-Ax. Furthermore, each sub- 
frame should also be placed into a standard size carrier frame lOcm-CF. Once each sub-frame lOcm-al 
through 10cn»*a8 is centered inside an equal sized carrier lOcm-Cf it is then easier to find and map 
overlapping compiessible sunilarities between successive sub-frames. 

Note that each sub-fi:ame such as lOcm-al throug|i 10cin-a8 carries with it the row and column absolute 
pixel coordinates (rl, cl) and (r2, c2). These coordinates indicates where each sub-frame was lifted finom 



wo 2005/099423 



CA 02563478 2006-10-16 



PCTAJS2005/013132 



-41. 

with respect to the original extracted block lOes, such as full color vapp&r portion lOf c. Since each extracted 
block lOes itself is also XDapped to the original captured image fiame 1 Oe, then ultimately each &cial 
region sub-firame such as lOcm-al through 10cm-a8 can be refit back into its proper position in a 
reconstructed image meant to match original image 10c. 

Still referring to Fig* 6d, depending upon their size, individual sub-h^mes such as lOcm-al may take up 
more or less space in &e standard sized carrier frame 10cm<JF. For instance, sub-frame lOcm-al takes up 
less space and would need to be expanded, or digitally zoomed by 60% to conq)letely fill the exan^le 
carrier firame lOcm-CF. On the other hand, sub-fiiame lOcm-aS comes fix>m an original image 10c that was 
already zoomed in on the player 10 whose &cial region lOcm-a it contains. Hierefbre, sub-frame 10cm-a5 
would only need to be zoomed by 10% for example in order to completely fill the carrier firame lOcm-CF. 
The present inventors prefer creating a single separate Stream A lOes-cm-dbl for each individual player 
10 as identified by tracking system 100. For each sub-frame such as l(km-al it is necessary to maintain 
the associated absolute pixel coordinates (rl , cl) and (r2, c2) maikmg its extracted location along with its 
centering of&et and zoom &ctor within carrier frame lOcm-CF. As will be qjpreciated by those skilled in 
the art, this information is easily obtained and operated iqwn and can be transmitted in association with 
each sub-£rame such as lOcm-al so tiiat each sub-fiame may be later *^iiq>acked'' and refit into a recreation 
of original image 10c. 

Referring next to Fig. 6e, the same stream lOes-cm depicted in Fl^. 6d is shown as a series of successive 
&cial region sub-fiames lOcm-al dirough 10cm-a8 centered along axis lOcmt-Ax and expanded to 
maximally fit into carrier frame lOcnt^IIF. In summary, frietme movement of these &dal re 
"removed," first by extracting common compressible regions, second by aliguing their centroids, and third 
by expanding them to roughly Ae same sized sub-frame pixel area. While this resultant stream lOes-cm is 
e3Q>ected to be hig^y compressible using traditional **fiill motion"* enable meftods such as MPEG, it is 
fiulher expected to be even more con:q)ressible using standards such as XYZ that is used for '^minimal 
motion" video telecommunications. Hence, the present apparatus and methods teach a way of essentially 
converting "fiill motion" video that is best conqpressed using techniques such as MPFG, into ' ^minimal 
motion** video, that can use simpler conpression metiiods tiiat typically experience significantly higher 
conqjression ratios. 

Referring next to Fig. 6f, there is shown the preferred layout of the identifying hehnet sticker 9a as 
attached, for example, to helmet 9 on upp^ portion lOfc of a typical player. Also depicted is single 
identifying sh^ 9a-c that oonqmses ijmer circle 9a-cf enconq>assed by outer circle 9a-co. The circular 
shape is preferred because the helmet sticker 9a is e?cpected to transverse equally in every direction in 
tiiree-dimensional space. Therefore, by using a circle, each overhead asseihbly such as 20c will have the 
maximum potential for locating and id^tifying a majority of each circular shape S^-c. Assuming a 
monochrome sensor 25b in overhead assemblies such as 20c, tiien inner circle 9a-ci is preferably filled in 
with the shades depicted in three tone Ust 9a-3t or four tone Ust 9a-4t Each tone list, 9a-3t and 9a-4C 
conqprises black (0) or white (256) and a remaining nuniber of grayscale tones spread equidistant between 
black and white. This method provides maximmn detectable differentiation between any two adjacent 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-42- 

inner circles 9a-cL Depending upon the grayscale tone selected for inner circle 9a-ci, outer circle 9a-co is 
filled in with eiflier black or white, depending \xpon which tone will create tiie greatest contrast between 
inner circle 9a-ci and outer circle 9a-co. This is important since preferred Step 4, depicted in Fig. 6a, will 
cause inner circles 9a-ci on helmet sticker 9a to be outlined during the creation of gradient image lOg. 
Presuming fliat sensor 25b detects color rather than monochrome, the present inventors anticipate 
optionally using distinct colors such as red, blue and green in addition to black and white within circles 9a- 
d or 9a-co. 

There is further depicted hehnet sticker view one 9a-vl, view two 9a-v2, view three 9a-v3 (which is 
sticker 9a) and view four 9a-v4, Starting with view one 9a-vl, there is shown the preferred arrangement of 
four circles 9a-cl, 9a-e2, 9a-c3 and 9a-c4. Similar to the rational for using &e circular shape, circles 9a-cl 
through 9a-c4 are arranged to provide maxiTmim viewing throughout all expected angles of orientation. It 
is anticipated fliat not all of the circles 9a-cl through 9a-c4 will always be within the current overhead 
assembly view 20 v, but is ejq)ected that this arrangement will increase this likelihood. Rirtiier note that 
since circles 9a-cl and 9a-c4 are fiirther apart from circles 9a-e2 and 9a-c3 (as depicted in view 9a-vl,) 
then image analysis in hub 26 can use this to detennine a "fix)nt-to-back^ versus "side-to-side" orientatioiL 
The present inventors anticipate that other information detectable firom extracted foreground blocks lOes of 
players 10 will provide adequate information to determine the player's 10 orientation without relying iq>on 
information &om the hehnet sticker 9a. Hence, while sticker 9a could be encoded so as to have a "front" 
versus "back" direction, it is preferable to simply use tiie sticker 9a to determine die identity of player 10. 
If tones are selected from chart 9a-3t, then each circle such as 9a-Gl can represent one of three distinct 
values, therefore providing a maximum of 3 ♦3*3*3=81 total combinations of tones. If tones are 
selected from cliart 9a-4t» then each circle such as 9a-cl can represent up to 256 distinct values. Under 
those conditions whsre it would be preferable to also detennine the player lO's orientation using the 
helmet sticker 9a, then the present inventors prefer limiting circle 9a-cl to either blade or white. In this 
case, circle 9a-c4 should be limited to any gray tone, (or color) but that chosen for 9a-cl. Therefore, the 
maximum number of unique encodings would equal 1 (for 9a-cl) * 3 (for 9a-c4) * 4 (for 9a-c2) * 4 (for 
9a-c3) = 48 possible combinations. With this encoding, helmet sticker 9a, using the four quarter-tones of 
chart 9a-4t, would provide fiont-to-back orientation as well as the identification of up to 48 participants. 
Referring next to Fig. 's 7a, 7b, 7c and 7d, there is depicted flie simultaneous capture and extraction of 
foreground blocks within a "four-square" grid of adjacent overiiead camera assemblies, such as 20c-l, 20c- 
2, 20^3 and 20c-4, each with a partially overlying views, 20v-l, 20v-2, 20v-3 and 20v-4 respectively, of 
their neigbbors. Wifliin the combined view of the grid, there are players 10-1, 10-2 and 10-3 as well as 
puck 3. For the ease of descrq)tion, it will be assumed that all of the cameras assemblies, such as 20c-l, 
20e-2, 20c-3 and 20c-4 are connected to a single hub, such as 26-1. This of course is not necessary as each 
of the cameras could just as well be processed by a different hub 26, sharing other camera assembUes, such 
as 20c-5, 20c-6, etc., or even a single hub 26 per each of cameras 20c-l, 20c-2, 20c-3 and 20c-4. 
SpedficaDy referring to Fig. 7a, player 10-1 is seen to be in the lower right hand comer of field-of-view 
20vl . After processing the Steps as described in Fig. 6, hub 26 returns extracted block lOel with comere at 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-43- 

(rl , cl) and (r2, c2). Hub 26 is prefenably programmed to include Step 8 of searching the extracted blocks, 
e,g. in this case lOel, for player identification stickers such as 9a-l on helmet ^1. Note that because of the 
minimally overl^ping fieldsK>f-view such as 20vl, 20v2, 20v3 and 20v4, players such as 10-1, 10-2 and 
10-3 can be e?q>ected to "splif across these fields-of«view on a regular basis. 

Referring next to Fig. 7b, player 10-1 and 10-2 form a single contiguous extracted block 10e2 while a 
portion of player 10-3 forms block 10e3. Note that when more than one players, such as 10-1 and 10-2 are 
is overlying fix)m the camera's viewpoint, it is treated as a single foreground block; regardless of the 
number of players in the contiguous group (i.e. 1, 2, 3, 4, etc.)- Hence, hub 26 is not trying to separate 
individual players, such as 10-1 and 10-2, but rather trying to ef&ciently extract contiguous foreground 
objects. Further note that helmet sticker 9a-3 of player 10s3 is only partially within view 20v-2. hi 
vqpconmg figure 7d, it will be shown that sticker 9a-3 is in full view of 20v-4. Thus, by ensuring that 
fields-of-view such as 20v-2 and 20v-3 always overlap by at least the size of die identifying sticker, such 
as 9a-3, it will always be the case that some hub, such as 26-1, will be able to determine the total number 
and identities of all players in a foregroimd block, even if that block is spHt Of course, this assumes that 
the sticker, such as 9a-3, is sufficiently oriented to camera 25 so as to be accurately detected While this is 
not dwaysesqpected to be the case, it is not required tiiat the sticker be viewed in every frame in order to 
track individual players. 

Referring next to Fig. 7c player 10-1 is seen to be in the upper ri^t hand comer of field -of-view 20v3. 
After processing, hub 26 transmits extracted block 10e4. Referring next to Fig. 7d, a portion of player 10-1 
is extracted as foreground block lOeS while puck 3 is extracted as block 10e6. Player 10-3 is also fiilly in 
view and extracted as block 10e7. Note that puck 3 can fiirm its own extracted block 10e6, either 
coiiq>letely or partially overl^iping the Tninitmim bounding box 2r-Bb of any other extracted blocl^ e.g. 
10e7. 

Referring next to Fig. 7e, the final conq)ilation and analysis of the stream of extracted foreground blocks 
such as lOel, 10e2, 10e3, 10e4, lOeS, 10e6 and 10e7 fi:om hubs such as 26 is depicted. As previously 
stated, there is significant benefit to ensuring diat for some the statistical tnaviTnn m percentage, each 
extracted block as created by hubs such as 26-1, include either "whole players" or "whole groups of 
players." First, this allows hubs such as 26-1 to create an accurate symboUc representation lOy-1 of a 
"whole player" or 10y-2&3 of a "whole group of players," residing completely within a single extracted 
block such as lOe-1 or 10&*2&3, respectively. Without this benefit, then tracking analysis conq)uter 100c 
must first receive and then reconqiile stream lOcs so that it can flien re-extract * Vhole players" and **whole 
groiq)s." Thus, by reducing the mimber of "splits," it is possible to elimmate the need for tracking analysis 
computer 100c to receive, let alone process, stream l Oes. Note fliat tiie few instances \sdiere a block spMt 
will occur, is expected to cause minimal degradation of the symbolic stream lOys and the ensuing 
performance analysis. 

The second benefit of ensuring a statistical maxinnmi of *Vhole" extracted blocks such as lOe-1 and lOe- 
2&3 is that the resulting stream lOes is sin^>ler to process for the automatic content assembly & 
conq)ression system 900. For exanQ>le, if "spUtting" exceeds a necessary miTiiniiim in order to ensure 



CA 02563478 2006-10-16 

WO 2005/099423 PCT/US2005/013132 

-44- 

quaHty images, then after receiving exttacted stream lOe^, each with identical time stanqps, system 900 
must first •*re-join" any detected "^lif blocks into new joined boxes. System 900 would then proceed to 
follow Steps exactly similar to 1 through 6 of Fig. 6b. In order to do this, such as 2<S-1 would then be 
required to additionally transmit the portions of background image 2r that corresponded to the exact pixels 
in the extracted blocks, such as lOe, for any detected "split blocks," Thus, "splitting'' will cause additional 
processing load on hubs such as 26-1 and the content assembly system 900 as well as data transmission 
loads through multiplexing hubs such as 28. All of this can be avoided by choosing the correct layout of 
ovethead camera assetoblies 20c such that subsequent current images 10c sufGciently overly to ensure 
statistical maximums of ^Svhole" extracted blocks lOe. 

hi either case, weather sphtting is e>q>ected and prepared for, or whether increasing the overly of 
assemblies such as 20c statistically eliminates it, at least the content assembly & conq>ression system 900, 
will perform the following st^s on the incoming stream lOes. 

Step 1 involves identifying each block such as lOel fliroug^ 10e7 as belonging to the same time c^tured 
instaace, regardless of the c^tuiing camera assemblies, such as 20c-l or the portions of &e tracking 
sur&ce 2 Ihe blodc is associated with. Note that every hub, such as 26-1 and 26-2, will be in 
synchronization with every assembly, such as 20c-l, through 20c-4 etc., that are all in &rther 
synchronization with power curve 25p such that all current images 10c are for the concurrent instants in 
time. 

Step 2 involves mapping each block into a virtual single view, such as 20v-a, made up of the entire 
multiplicity of actual views 20v, the size of the tracking area including sur&ce 2 and any adjoining areas 
such as 2e, 2f, 2g and 2h. Hence, coordinates (rl, cl) and (x2, c2) associated with each extracted block lOe 
are translated through a camera4o-tracking-5urfiice relationship table such that they now yield a unique set 
of virtual coordinates such as (Qrl], f[cl]) and (f[r2], f[c2]). Since camera assembUes 20c have 
overtyping field&of-view 20v, some extract blocks, such as lOe-1 may '^overlay" other blocks, such as 
lOe-2 in the single virtual view 20v-a as it is constructed. After adjustments for image registration due to 
partial off-axis aUgmnent between adjacent image sensors 26b, &e "overlaid" potions of one block, such 
as lOe-1 on top of another block, such as lOe-2, will represent the same informatioiL Hence, after piecing 
each of the blocks such as lOel dirough 10e7 onto single view 20v-a, system 900 will have created a 
single virtual image as depicted in Fig. 7e. 

As previously mentioned, if flie extracted block stream lOes is not sufficiently ftee of "sphf ' blocks, then 
both tracking analysis conq)Uter 100c and content assembly & con^ression system 900 must now perform 
Steps similar to 1 through 6 as discussed for F^. 6a., which were akeady performed once by hubs such as 
26-1. Again, in order to perform such Steps at least including image subtraction Step 3 or gradient Step 4, 
hubs such as 26-1 nmst additionally transmit the portion of the back^und image 2r that matches the 
location of the minimum boimding boxes, such as 2r-Bb, of each extracted foreground block lOe for those 
blocks deteiixunedtobe^spHt'' (Ihisdeteiminationbyhubs5uchas26-l can be made by simply marking 
an extracted block, such as lOe, as "sphf if any of its bounding edges touch the outer edge of the field-of- 
view, such as 20v-l of a particular camera assembly, such as 20c-l.) 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-45- 

As shown, the amount of regular "splitting" of players 10 is directly related to the percentage overly of 
adjacent camera assemblies 20c as depicted in Fig.'s 2, 3 and 5a. When the overly is restricted to 
minim a l ly include the size of helmet sticker 9a, and thereby requiring the fewest overall assemblies 20c, 
then the splitting rate is statistically near maximum. In this case, image analysis conq>uter 100c may only 
expect to know Ihe identity of every player within an extracted block, such as lOel, 10e2, 10e3, etc., 
assuming the sticker is appropriately visible in the current frame 10c. Individually extracted blocks cannot 
be expected to nearly always contain **^ole players" or "whole groiq)s of players," This particular design 
of the m aximi m i spread of camera assemblies 20c and therefore miniTy ial overl^ping of fields-of-view 20v 
thus requires that tracking analysis computer 100c to first join all adjacent blocks such as lOel and 10e2 
before players such as 10-1 and 10-2 can be folly outlined as shown in Steps 3a and 4a of Fig. 6a. Later in 
the present specification during the especially during the discussion of Fig.'s 10a through lOh, two 
different overhead layouts will be addressed that teach how to increase the overly between adjacent 
assemblies 20c. While these alternate layouts increase the total required assemblies, such as 20c-l, 20c-2, 
etc. to view tracking surface 2, they will inversely decrease the statistical rate of player 10 "sphtting," 
thereby reducing the work required by tracking analysis computer 100c. 

Referring next to Fig. 8, there is shown the progression of information fi:om current image lOcl and 10c2, 
to gradient image lOgl and lOgl, to symbolic data lOsl and 10s2 to graphic overlay lOvl and 10v2. Prior 
paragraphs of the present specification have discussed the steps necessary to go firom a current image, such 
as lOcl, to a gradient image such as lOgl; regardless of whether this is done completely in hubs 26, or first 
in hubs 26 and again in content assembly & conipression system 900 after reforming all extracted blocks in 
stream lOes. As shown in Fig. 6fi helmet sticker 9a preferably conqirises four circular sh^es, each taking 
on one of an allowed four distinct grayscale values, thereby forming pi possible identity codes, as 
previously discussed Depending vipon the gray tone of its interior 9a-ci, each circle is surrounded by outer 
circle 9a-co whose gray tone is chosen to create the highest contrast according to the 0 to 250 detectable 
shades, thereby ensuring maximum shape recognition. When image lOc is processed to first get gradient 
lOg, these circles in sticker 9a will be detected since the difference between the surrounding grayscale and 
the interior grayscale for each circle will, by design, always exceed the gradient threshold Once the 
individual circles are detected, the close, preset configuration of the four circles will be an indication of a 
hehnet sticker 9a and can be found by normal image analysis techniques. The centroid (rx, cx) of the four 
detected circles will be used to designate the center of player lO's head lOsBL Sticker 9a is constructed to 
include a detectable forward orientation and therefore can be used to determine the direction a player's 10 
head is facing. This orientation information is potentially helpfol during later analysis by performance 
measurement system 700 as a means of helping to determine what play options may have been visible to 
any given player 10 or referee. 

Assuming fliat there is only a single helmet sticker 9a found witiiin the conq)lete foreground object, such 
as lOe, after the location of player 10*s head lOsH is determined an oval lOsB will be optimally fit around 
the remaining portion of the foreground object's gradient outline. In the case where multiple helmet 
stickers 9a are found within a single foreground object lOe, ihe assun^Dtidn is tiiat multq)le players 10 are 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-46- 

in contact and therefore are forming a contiguoiis potion of image 10c. In this case, fee edge of the 
determined body ovals lOsB will be roughly midpoint between any two detected stickers 9a, In many 
cases, singly by following the outline of the gradient image towards the line segment formed by two 
neifijiboring players' helmet stickers, the limits of body circles lOsB will be evident Similar to the 
orientation of the player's 10 head, using image analysis the body oval lOsB can be analyzed to detennine 
the orientation of tibe player's 10 shoulders. Specifically, oval lOsB will approach an elhptical sh^ as the 
player stands iipright As is known, the sum of the distances of any point on an ellipse to the foci is 
constant This information can be used in combination with the feet tiiat the fixint of player's 10 body, and 
therefore the "fixmf ' of any representative ell^se, is oriented in the direction of tiie fiont of the hehnet 
sticker 9a. (Hence, the front of the body is always in the forward direction of the player's 10 head that can 
be determined by the orientation of the sticker 9a.) By selecting mult^)le points along the "front" edge of 
the player's 10 gradient lOg outline and for each point determining the sum of the distances to either side 
of the base of the player's neck (assumed to be a fixed distance &om the center of the hehnet sticker 9a) an 
average sum can be calculated providing the necessary equation for a shoulder ellipse. It should be noted 
that this ellq>se will tend to be equal to or less than the larger oval that encon^>asses the player's 10 body. 
Again, it will be more equal vfhsa the player is standing upright and be less as the player is bent over. For 
this reason, the calculation of the ellq>se should be made using "fonf ' edge points ofi^'the player outhne. 
The difference between the edge of the ellipse and the oval, feeing die backside of player 10, can be used 
by performance measurement system 700 to determine valuable information concerning player stance. 
Again referring to Fig. 8, the symbolic data lOsl and 10s2 will also include the stick lOsS. The 
configuration of pixels forming an extended, narrow straight line can be detected and interpreted as a 
player's stick lOsS. Both end points of the detected stick lOsS can be used to define its location. 

Referring next to Fig, 9a, fliere is shown tinee players, 10-5, 10-6 and 1 0-7 each on tracking surface 2 
within view of overhead tracking assembly 20c, When equipped with the proper lens, an assembly such as 
20c affixed at twenty-five feet above the tracking surfece 2, will have a field-of-view 20v of proximately 
eighteen feet, at roughly six feet off the ice surfece. At the level of flie trackmg surface 2, the same field- 
of-view 20-v is ^^proximately twenty-four feet wide. This distortion, created by die widenmg of field-of- 
view 20v based upon tiie (hstance from flie assembly 20c, limits the hub 26' s ability to detennine the exact 
(X, Y) location of a detected foreground object such as hehnet sticker 9a-5 on player 10-5. This is further 
illustrated by the pafli of exairq)le ray 25r as it transverses 6om hehnet sticker 9a-6 on player 10-6 straight 
flirough hehnet sticker 9a-S on player 10-5. As is depicted in the inset top view, image analysis would 
locate the helmet stickers 9a-6 and 9a-5 at the same X+n coordinate along the image frame. As will be 
shown first in Fig. 9b and later in Fig^s 10a dirough lOh, it will be necessary that each hehnet stidcer, such 
as 9a-5 and 9a-6, be in view of at least two overhead assemblies such as 20c at all times. Since the relative 
locations between all overhead assembhes 20c will be preset, hubs 26 will be able to use standard 
triangulation techniques to exactiy locate any foreground object as long as it is seen in two separate camera 
assemblies 20c fields-of«view 20v. This is especially helpfiil for foreground objects such as a hehnet 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-47- 

sticker 9a or puck 3, for which the triangulation technique essentially provides three-dimensional 
information that can be used for additional critical measurements. 

Also depicted in Fig. 9a is standing player 10-7 wearing helmet sticker 9a-7. Player 10-7 is shown to be 
just on the edge of field-of-view 20v. In this position, any images 10c captured by assembly 20c will not 
have a full view of hehnet sticker 9a-7. As will be taught in tiie specification for Fig,'s 10a thiou^ lOh, 
this will require that at certain field-of-view intersections, three overhead assemblies such as 20c must be 
present since at least one view will only partially include either the player 10-7 or their hehnet sticker 9a-7. 
Referring next to Fig. 9b, there is shown two adjacent overhead camera asseniblies 20c-A and 20c-B. 
When assemblies 20c-A and 20c-B are in Position 1, tfieir respective fields-of-view 20v-Al and 20v-Bl 
overlap at a point 20v-Pl. Since overi^ point 20v-Pl is less than player height, it will be possible that a 
given player such as 10-1 can stand at certain locations on the tracking sur&ce 2, such as blind spot 20v-H, 
and be essentially out of view of both adjacent assemblies* fields-of-view, such as 20v-Al and 20v-Bl. 
These out-of-view locations wiD tend to be centered mid-way between adjacent assemblies 20c, In order to 
eliminate this possibility, adjacent asseniblies such as 20c-A and 20c-B can be closer in proximity as 
would be acconqpHshed by moving 20c-B to depicted Position 2. By so doing, the new overlap point 20v- 
P2 is raised to just include the expected maximum player height thereby assuring that at least the player's 
hehnet sticker such as 9a-l will always be in view of one of the two adjacent assembhes' fields-of-view, 
20v.Alor20v-B2. 

However, as was previously taught, it is beneficial that the extracted foreground blocks lOe created fiom 
the current images 10c as c^tured by asseniblies such as 20v-A and 20v-B include the entire player, such 
as lQ-1. By so doing, there is less subsequent "stitching^ work for the content assembly & con^nression 
system 900. This is because system 900 wiU no longer be required to join extracted blocks 
images of tiie same player such as 10-1, who was essentially straddling two adjacent fields-of-view, such 
as 20v-Al and 20v-B2. By fiirther moving assembly 20&-B to Position 3, the new overly ponit is now set 
at 20v-P3 that is hi^ enough so &at a single player such as 10-1 will always be conq>letely within one 
adjacent assembly's field-of-view, such as 20v-Al or 20v-B3. The present inventors prefer an even higher 
overlap point such as 20v-P4, created by moving assemblies 20o-A and 20c-B still closer together. For 
instance, with assembly 20c-A at Position 2, the resulting overlapping views 20v-A2 and 20v-B3 will be 
sufGcient to always include a small group of players such as 10^1 and 10-2. 
As was previously stated, it is preferable that each player 10, or at least their hehnet sticker 9a, be 
constantly in view of at least two overhead assemblies such as 20c. As shown in Fig. 9b, &ere are 
arrangements between two adjacent cameras that ensure that either the entire player 1 0, or at least their 
hehnet sticker 9a, are in view of at least one adjacent assembly, such as 20v, at all times. In the ensuing 
paragraphs, it will be shown that it is necessary to add an "additional second layer*' of assemblies with 
offeet fields-of-view in order to ensure that this same player 10, or their helmet sticker 9a, is always in 
view of at least two assemblies, such as 20c 

Referring next to F^. 9c, tbere is shown a perspective view of two overhead assemblies 20c-A and20c-B 
whose fields-of-view 20v-A and 20v-B, respectively, overlap on tracking surEace 2. Specifically, once the 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-48- 

eatire matrix of overhead assemblies such as 20c-A and 20c-B have been installed and cahTjrated, together 
fliey will break the entire tracking surface into a grid 2-g of fixed locations, such as 2-L74394. Each 
location, such as 2-L74394, represents the smallest recognizable area detectable by any individual 
assembly such as 20c-A or 20c-B. The size of each location will be based primarily upon the chosen 
distance between tracking surfece 2 and assemblies 20c, optics 25a and image sensor 25c, as will be 
understood to those skilled in the ait The present inventors foresee a location size equal to approximately 
1/2 inches squared that is equivalent to die minimal area covered by a pixel for the preferred configuration 
of tracking system 100. What is most inq>ortant is the additional information available to hubs such as 26 
for the execution of foreground extraction steps such as depicted in Fig. 6a. Hence, wifli only a single view 
20v of any given area of tracking surfece 2, hub 26 can compare prior images of the bacl^und 2r with 
current images 10c to help extract foreground objects lOc. However, with multiple views, such as 20v-A 
and 20v-B of tiie same area of tracking surfece 2, hub 26 can know additionally compare portions of the 
current image, such as lOc-A firom assembly 20c-A with portions of the current image, such as lOc-B fiom 
assembly 20c-B. 

Still referring to F^. 9c, grid 2-g location 2L-74394 qjpears in four separate images as follows. First, is 
spears as pixel location 10i>Ap54 in current image lOc-A of assembly 20c-A. Second, it appears as pixel 
location 2r-Ap54 in background image 2r-A associated with assembly 20c-A. Third, it also spears as 
pixel location lOc-BpKH in current image lOc-B of assembly 20c-B. And forth, it appears as pixel 
location 2r-B104 in background image 2r-B associated with assembly 20c-B. (It should be noted that the 
present inventors will teach the benefit of a triple overl^mg view of the tracking surfece during the 
discussion of Fig.»s 10a through lOg. In this case a single grid location such as 2L.74394 would appear a 
in a third current and a third background image further si^porting foreground extraction.) The benefit of 
using this additional information beyond the background 2r to current image 10c conq>arison discussed in 
association with Fig, 6a will be taugfht in this Fig. 9c as well as Fig. 9d and 9e. 

With respect to this benefit, and still referring to Fig. 9c, there is also depicted ligjiting source 23 that casts 
rays 23r towards and i^n tracking surfece 2 . As will be shown m the ensuing discussions of Fig. »s 9d and 
9e, rays 23r in combination wifli moving foreground objects will cause shadows to fell iipon individual 
locations such as 2L-74394. These shadows may cause individual locations such as 2Lr74394 to differ 
fi-om their stored background equivalents, such as 2r-Ap54 and 2r-Bpl04. However, as will be shown, as 
long as rays 2s-rA and 2s-rB reflecting off location 2L-74394 are not blocked on their path to assembUes 
20c-A and 20c-B respectively, then location 2L-74394 will always be the same as represented by its 
cunrent image equivalents, such as 10c-Ap54 and 10c-Bpl04. Hence, if for any given time instant, the 
conq>arison of 10c-Ap54 and 10c-Bpl04 results in equahty within a specified minimum tiireshold, then 
the likehhood that both assonblies 20c-A and 20c-B are viewing flie same tracking suifece location 2L- 
74394 is sufficiently higji. Therefore, these given pixels 10c-Ap54 and 10c-Bpl04 can be set to null values 
with or without confirming comparisons to respective background pixels such as 2r-Ap54 and 2r-Bpl04. 
Referring next to Fig. 9d, there is shown tfie same elements of Fig. 9c with the addition of players 10-1 
and 10-2. Players 10-1 and 10-2 are situated so as not to block the pafli of rays 2s-rA and 2s-rB as tiiey 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-49^ 

reflect off spot 2L-74394 into assembUes 20c-A and 20c-B. However, especially player 10-1 is situated so 
as to block illuminating rays 23r emitted by lamp 23 causing shadow 2s on traclmg surface 2. 
Furthermore, shadow 2s encompasses surfece location 2Lr74394 and as such causes current image pixel 
10c-Ap54 to differ from stored equivalent background pixel 2r-^54. Likewise, current image pixel 10c- 
Bpl04 is caused to differ fiiom stored equivalent background pixel 2r-Bpl04. By using methods similar to 
those described in Fig, 6a, the subtraction of current image lOc-A from background image 2r-A is 
expected to occasionally result in die extraction of portions of shadow 2s, depending upon its intensity, as 
depicted by extracted block lOe-Al. Notice that as depicted in Fig. 9d, there are no actual foreground 
objects, such as player 10-1, that are cunently in view of assembly 20c-A. Hence, tiie analysis of current 
image lOe-A and stored background image 2i>A should ideally produce no extracted block 1 Oe-Al . 
Similarly, the subtraction of current image lOc-B from background image 2r-B is e:q)ected to occasionally 
result in the extraction of portions of shadow 2s, depending upon its intensity, as depicted by extracted 
block lOe-Bl. In tiie case of assembly 20c-B as depicted, players 10-1 and 10-2 are in current view lOc-B 
and would therefore ideally be expected to show isp in extracted block lOe-Bl. However, block lOe-Bl 
should not also include any portions of shadow 2s. 

Still referring to Fig. M, by augmenting the n^thods first taught m Fig. 6a to additionally include the step 
of con^jaring any given current pixel, such as 10c-Ap54 with its corresponding current pkel in any 
adjacent assemblies, such as pixel 10c-Bpl04 in assembly 20isB, it is possible to reduce the detection of 
shadow 2s as a foreground object Hence, if the result of any such con^parison yields equahty withina 
minimal tolerance, then that pixel(s), such as 10c-Ap54 and 10c-Bpl04, can be assumed to be a portion of 
the background, such as 2Lr-734SM and therefore set to null, Hierefore, the methods and steps first tau^t in 
Fig. 6a are here further taught to mclude tiie step of making the additional con^wison of current image 
lOc-A, captured by assembly 20c-A, to current image lOc-B, 6om any adjacent overla^yping assembly such 
as 20c-B. Hence, the creation of extracted block 10e-A2 (which is empty or all null,) is based upon current 
image lOc-A, background image 2r-A and adjacent current image lOc-B. Likewise, the creation of image 
10e-B2 (which only contains portions of players 10-1 and 10-2,) is based upon current image lOc-B, 
bacl^und image 2r-B and adjacent current image lOc-A. (Note, in flie case of a tiiird adjacent 
overlying assembly, similar to 20c-A and 20c-B, then its current hnage would also be made available for 
comparison.) The combination of all of this information irKa:eases the Ukelihood that any extracted blocks 
contain only true foreground objects such as player 10-1 or puck 3, regardless of teiiq)oral lighting 
fluctuations. For outdoor sports such as football, the shadows 2s formed on the tracking surfece 2 are 
expected to be potentially much more intense than the shadows created by indoor lighting such as depicted 
Hence, by using the cahTwated foreknowledge of which current pixels, such as 10c-Ap54 and 10c-Bpl04, 
correspond to the same trackfaig location 2L-74394, the present invention teaches that these associated 
current image pixels will track together througjiout changing lighting conditions and will only be different 
if one or more of flieir reflected rays is blocked by a foreground object such as player 10-1 or 10-2 or even 
puck 3. 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-50- 

Referring next to Fig. 9e, there is shown the same elements of Fig. 9d except that players 10-1 and 10-2 
are now situated so as to block assembly 20c-B's view of grid location 21r-74394 on tracking sur&ce 2. In 
so doing, it is significantly less likely that current pixel 10c-Bpl04, now viewing a portion of player 10^1, 
will identically match current pixel 10C-AP54, still viewing tracking sur&ce location 2L-74394. 
Furthermore^ when taken in total, hub 26 will have a substantially increased ability to detect foreground 
pixels by comparing any single current pixel such as 10c-Bpl04 to its associated background equivalent, 
2r-Bpl04, its associated current image equivalent 10c-Ap54, and that associated equivalent's background 
pixel 2r^Ap54. (Again, as will be taught in upcoming Fig/s 10a tfarou^ lOh, with trqple overlapping 
views of all individual tracking sur&ce locations such as 2L-74394, at least one other current pixel and 
equivalent background pbcel would be available to conq>arisorL) 

Still referring to Fig. 9e, ihexo is also depicted recent average images 2r»At and 2r-Bt Recent average 
image 2r-At is associated with background image 2r-A and current image lOc-A. As was previously 
taught in association with Fig. 6a, during Step 8 processing hub 26 '^freshes" background images, such 
as 2r-A and 2r-B, with the most recent detected pixel values of all determined background locations such 
as 2L-74394. ITiis "refteshing" is singly the updating of the particular corresponding background pixel, 
such as 2r-Ap54 with the most recent value of 10c-Ap54. (Note tiiat in Fig. 9e, background pixel 2r- 
Bpl04 would not be similarly updated with the value of current pixel 10c-Bpl04, since this pixel would be 
determined to be representative of a foreground object) This resetting of value allowed the background 
image to "evolve" througjiout the sporting contest, as would be the case for an ice surfece that becomes 
progressively scratched as the game is played. This same purpose is beneficial for outdoor sports played on 
natural turf that will have a tendency to become torn up as flie game proceeds. In fact, many football games 
are played in mud or on snow and consequendy can create a constantly changing background. 
However, in addition to the resetting of background images such as 2r-A with the most recently 
determined value of an given tracking sur&ce location, the present inventors teach the use of fnaintaining a 
"moving average," as well as total "dynamic range" for any given location, such as 2L-74394. The 
"moving average" represents the average value of flie last "n" values of any given surfece location such as 
2L-74394. For instance, if die game is outdoors and the ambient lighting is slowly changmg, then this 
average could be taken over the last five minutes of play, amounting to an average over the last 300 values 
when filming at 60 frames per second. The averages Ihranselves can be compared to form an overall trend. 
This trend wiD indicate if the lighting is slowly "dimming" or **brightening" or simply fluctuating. Along 
with the average value taken over some increment, as well as the trend of averages, the present inventors 
prefer storing a "dynamic range" of the min and max detected values that can serve to limit the tniniTnnm 
ttueshold used to distinguish a background pixel, such as 10c-Ap54 fixmi a foreground pixel, such as 10c- 
Bpl04. Specifically, when tiie current pixel such as 10c-Ap54 is compared to the background pixel 2r- 
Ap54, it will be considered identical if it matches within the determined dynamic range unless the recent 
trend and last moving average value constrain the possibilities to a narrow portion of tiie dynamic range. 
For exanople, even if tiie current pixel value, such as 10o-Bpl04, for a given location such as 2L-74394, is 
within the total min-max determined over tiie course of a game, since the outdoor lifting has been 



CA 02563478 2006-10-16 

WO 2005/099423 PCT/US2005/013132 



-51" 

steadily decreasing this value may be too fari^t to be consistent with flie recent averages and trend of 
averages. Hence, in order to provide maxinrmm information for Ae extraction of foreground objects such as 
players 10-1 and 10-2 from the background of the tracking surface 2, even when that background is 
changing due to either sur&ce degradation or changes in ambient lighting, flie present invention teaches flie 
use of: 1) &e current pixel from the current image and aU overlapping images, 2) the associated 
"refreshed" background pixel from the current image and aU overlapping images, and 3) the "moving 
average" pixel, along with its trend and "dynamic range." 

Finally, and still referring to Fig. 9e, diere is shown extracted block lOe-Al, that is a result of con^>arisons 
between the aforementioned current pixel information, such as 10c-Ap54 and 10c*Bpl04, the background 
information, such as 2r-Ap54, and moving average / dynamic range information such as 2r-Ap54t 
Likewise, &ere is shown extracted block 10e-B2, that is a result of coixq^arisons between the 
aforementioned current pixel information, such as 10c-Bpl04 and 10c-Ap54, die background information, 
such as 2r-Bpl04, and moving average / dynamic range information such as 2r-Bpl04t 
Referring next to Fig. 10a, there is shown a top view diagram of the combined view 22a covered by the 
fields-of-view 20v-l throu^ 20v-9 of nine neighboring cameras assemblies, such as 20c, laid out in a 
three by diree grid. This layout is designed to maximize coverage of the tracking sur&ce whUc using the 
minimal required assemblies 20c. Hus is accoirplished by having each assembly's 20c field-of-view, such 
as 20v-l through 20v-9, line up to each adjacent field-of-view with mitiimftl overhqp as depicted. 
As was tau^t in the prior paragr^hs referring to Figs 9a and 9b, it is mandatory that the fields-of-view, 
such as 20v-l aiKi 20v-2, at least overlsqp enou^ so that frieir overlap point, such as 20v-P2 in Fig. 9b, is 
no less than the maximum expected player height hi Fig. 10a, the edge-to-edge configuration of fields-of- 
view 20v-l through 20v-9 are assumed to be at the expected maximum player height, for instance 6* 1 1" 
off tracking sur&ce 2, resulting in overlap point 20v-P2, rather than at some lesser heig^ resulting in an 
overlap point such as 20v-Pl. If Fig. 10a were depicted at tracking sur&ce 2 levels, the same fliree-by- 
three grid of fields-of-view 20v-l through 20v-9 would be overlapping rather than edge-to-edge. 
Referring next to Fig. 10b, fields-of-view 20v-l, 20v-4 and 20v-7 have been moved so that they now 
overlap views 20v-2, 20v-5 and 20y-8 by area 20v-Ol, representing an overlap point similar to 20v-P3 
shown in Fig, 9b. Hie present inventors prefer this as the minimal overlap approach to ensuring diat the 
helmet stickers 9a on all players 10 are always in view of at least one field-of-view such as 20v-l tiuough 
20v-9 in combined viewing area 22b. 

Referring back to Fi^. 10a, each edge between adjacent fields-of-view 20v-l through 20v-9 have been 
marked by stitch line indicator C'X")» such as 20v-S. If a player such as 10 is straddling anywhere along an 
edge denoted with indicator 20v-«, then their image will be split between neigjiboring fields-of-view, such 
as 20v-l and 20v-2, thereby requiring more assembly by content assembly & conq)ression system 900 as 
previously explained. To reduce this occurrence, one solution is to fiirdier increase the overlap area as 
depicted by the movement of fields-of-view 20v-3, 20v-6 and 20v-9 to overlap 20v-2, 20v-5 and 20v-8 by 
area 20v-O2. This corre^nds to oved£^ point 20v-P4, as shown in Fig. 9b, and increases the total 
number of assemblies 20c required to cover the entire tracking sur&ce 2. This is the preferable ^>proach if 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-52- 

only a "'single layer^' of assemblies 20c is to be employed. In the ensuing paragr^hs, the present inventors 
will teach the benefits of adding additional 'layers" of offset, overl2q>ping camera assemblies 20c. As will 
be discussed, while these qjproaches add significantly more assemblies 20c, they also provide significant 
benefits not possible witii die ''single layer*' approach. For instance, they allow for three-dimensional 
imaging of the foreground objects such as helmet sticker 9a and puck 3. Furthermore, by overlying 
"layers," each individual layer can remain fiirther spread out Hence, overlap areas such as 20v-Ol will be 
shown to be adequate over overlap areas 20v-O2. 

This "second layer" f^roach is preferred and will ensure that each player lO's helmet sticker 9a will be in 
view of at least two fields-of-view 20v at all times. By ensuring two views at all times, tracking analysis 
system 100c will be able to more precisely determine sticker 9a's pC, Y) coordinates, as discussed in Fig, 
9a, essentially because it will be able to triangulate between the views 20v of two adjacent assemblies 20c. 
Furthermore, system 100c, will also be able to determine the hci^t (Z) of sticker 9a; thereby providing an 
indication of a players iq)right stance. The tiiird (Z) dimension enabled by ibe '^second layer" is also 
extremely valuable for tracking the movement of puck 3 and stick 4. The following e?q)lanation of Fig. 's 
10c, lOd, lOe and lOf teach the addition of the "second layer." 

Referring next to Fig. 10c, there is shown combined view 22c con^msing a grid of four-by-four fields-of- 
view 20v, each separated by overly 20v-Ol. This combined view 22c can be diought of as the "first 
layer." Note ibst the overlap areas 20v-Ol between adjacent assemblies 20c are only in the vertical 
directioa Referring next to Fig. lOd, the same combined view 22c is depicted slightly differently as four 
elongated views such as 20v-G created by each group of four horizontally adjacent overlapping fields-of- 
view 20v. This depiction better isolates the remaining problem areas ^ndiere extracted image lOe "stitching** 
will be required as players 10 move along the edges of each horizontally adjacent groi^, such as 20v-G. 
These edges, such as 20v-SL, are denoted by the squigg^e lines (""'") crossing them out However, as will 
be shown in Fig. 1 Oe, ra&er &an moving each of these groups, such as 2Dv-G, to overlap in the horizontal 
direction sinoilar to vertical overlap 20v-Ol, a "second layer" of assemblies 20c will be added to reduce or 
eliminate the stated problems. 

Referring next to Fig. lOe, there is shown underlying first layer 22c, as depicted in Fig. lOd, with 
overlying second layer 22d. Second layer 22d conqxrises a grid of three-by-three fields-of-view 20v 
similar to combined view 22a in Fig. lOb. By adding second layer 22d, such that each field-of-view 20v in 
layer 22d is exactly straddling the fields-of-view in underiying layer 22c then problems pursuant to 
horizontal stitching lines such as 20v-SL are eliminated. The result is that only remaining problem areas 
are vertical stitching lines such as 20v-SL shown in Fig. lOf. However, the underlying first layer 22c is 
also o£&et &om second lay^ 22d in the vertical direction, thereby always providing overlapping fields-of- 
view 20v along v^cal stitching lines such as 20v-SL. Thus, the remaining problem spots using this 
double layer approach is now reduced to the single stitching points, such as 20v*-SP, that can be found at 
the intersection of tiie horizontal edges of fields-of-view 20v in first layer 22c with the vertical edges of 
fields-of-view 20v in second layer 22d. 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-53- 

Refeiring next to Fig. lOg, underiying first layer 22c remains unchanged whUe overi^jping second layer 
22d has now become layer 22e. Fields-of-view in layer 22e have been vertically overl^ed similar to the 
change made from combined view 22a in Fig. 10a to view 22b in Fig. lOb, assuming the vertical overlap 
of 20V-O1. This final change to second layer 22e then removes the only remaining problems associated 
with single stitching points such as 20v-SP. Referring next to Fig. lOh, underlying first layer 22c and 
overlapping second layer 22e are depicted as single fields-of-view as if they represented one camera 
assembly 20c for each layer. Note that the viewing area encom|>assed by overl^jping layer 22e is now 
considered to be available for tracking, \^ereas ouflying areas outside tiie combined view of layer 22e are 
not ideal for tracking even thougji they are still within view 22c. It is anticipated that these outlying areas 
wiU be suf&cient for tracking players such as 10 in team bench areas such as 2f and 2g or in penalty areas 
such as 2h. Especially for redundancy principals, the present inventors prefer adding a third layer of 
ovoiiead tracking cameras overl^iog first layer 22c and second layer 22e. Hiis will ensure Aat if a 
single camera assembly 20c malfunctions, Aether on any layer such as 22c, 22e or the third layer not 
shown, that any given area of the tracking sur&ce will still have at least two other asseinblies 20c in proper 
viewing order, thereby enabling three-dimensional imaging. 

So as to avoid any confiision, since camera assemblies 20c in first layer 22c and second layer 22e are 
physically of&et, in practice they are preferable kept on the same horizontal plane. In this regard, ttie 
camera assemblies themselves are not forming actual 'Vtysical layers," but rath^ their resulting fields-of- 
view are forming "virtual layers." 

Referring next to Fig. 11a, there is shown automatic game filming system 200, that accq)ts streaming 
player 10, referee 12 and puck 3 location information from trackiug database 101 (not depicted) into 
center-of-view database 201. As it receives fliis continuous stream of individual foreground object 
locations and orientations, system 200 dynamically determines v/hat game actions to follow on the tracking 
sur&ce 2, such as the current location of the puck 3. System 200 tiien performs calculations on the tracking 
data as it is received to determine which of its controlled filming stations, such as 40c, will have the best 
view of tiie current and anticQ)ated game action. (Hence, the present inventors anticipate that multiple 
controlled filming cameras, such as 40c, will be placed around the sports venue to offe different vantage 
points for filming; each of them controlled by the game filming system 200.) The calculations concerning 
each station 40c's field-of-view are enabled by an initial calibration process that determines the (X, Y, Z) 
coordinates of the fixed axis of rotation of each filming camera 45f within station 40c. These QC, Y, Z) 
coordinates are expressed in the same local positioning system being used to calibrate the image analysis 
and object tracking of system 100. 

As previously discussed, system 100 is able to determine the location, such as (rx, cx) of the center of the 
player's helmet sticker 9a, that serves as an acceptable approximation of the current location of the player 
10. Furthermore, system 100 could also determine flie orientation of sticker 9a and body shap& lOsB, and 
dierefore "fionf and *T)ack" exposure of player 10. This information is valuable to system 200 as it 
dynamically detemunes which of its controlled filming stations, such as 40c, is best located to film the on- 
coming view of the player currently carrying the puck. Also valuable to system 200 are tiie identities of the 



wo 2005/099423 



CA 02563478 2006-10-16 



PCTAJS2005/013132 



-54- 

players, such as 10-1 and 10-2&3 currently on the tracking sur&ce 2. These identities can be matched 
against pre-stoied infbnnation characterizing each player's 1 0 popularity and relative inq>oTtance to the 
game action as well as tendencies to effect play by carrying the puck 3, shooting or checking. Given this 
cQmbination of detailed player 10 locations and orientation as well as identities and therefore game 
inqK>rtance and tendencies, system 200 can work to predict likely exciting actioa Hence, while system 200 
may always keep selected filming stations, such as 40c, strictiy centered on puck movement, it may also 
dedicate other stations similar to 40c to following key players 10 or "developing situations." For exan^le, 
system 200 could be progranmied to follow two known "hitters" on opposing teams when they are detected 
by the tracking system 100 to potentially be on a collision course. 

In any event, and for whatever reason, once system 200 has processed tracking data Scanx system 100 and 
determined its desired centers-of-views 201, it will then automatically transmit tiiese directives to tiie 
appropriate filmin g stations, such as 40c, located througjwut tiie playing venue. Referring still to Fig. 11a, 
processing element 45a, of station 40c, receives directives from system 200 and controls the automatic 
functioning of pan motor 45b, tilt motor 45c aiul zoom motor 45d. Motors 45b, 45c and 45d effectively 
control the center of view of camera 45f-cv. Element 45a also provides signals to shutter control 45e that 
directs camera 45f when to capture images 1 Oc. Note that it is typical for cameras c^turing images for 
video streams to take pictures at the constant rate of 29.97 fiiames per second, the NTSC broadcast 
standard- However, the present invention calls for cameras tiiat first synchronize their frames to the power 
curve 25p, shown in Fig. 5b, and then additionally synchronize to the controlled camera movement 
Hence, stations 40c only capture images 10c when power curve pulse 25s occurs, ensuring sufficient, 
consistent lig^iting, in synchronization with controlled movement of motors 45b, 45c and 45d, such that the 
camera center-of-view 45f-cv is at a repeatable, allowed angle / depth. This tigjit control of image 10c 
c^ture based upon maximum lighting and repeatable allowed viewing angles and depths allows for 
in^rtant streaming video con^ression techniques as will be first tau^t in the present invention. Since 
element 45a is controlling the rate of panning, tilting and zooming, it can effectively control die movement 
of camera 45f, thereby ensuring that field-of-view 45f-cv is at an allowed viewing angle and d^th at 
roughly die desired image capture rate. As previously discussed, this rate is ideally an even multiple of 
tixirty (30) firames-per-second, such as 30, 60, 90, 120 or 240. 

As camera 45f is controUably paimed, tilted, zoomed and shuttered to follow the desired game action 
images such as lOcL, 10c, lOcR and lOcZ are captured of players, such as 10-1 and 10-2&3, and are 
preferably passed to image analysis element 45g. Note diat analysis element 45g, in stations 40c, is similar 
to digital signal processor (DSP) 26b in image extraction hub 26 and may be itself a DSP. Also, 
background image memory 45h, in stations 40c is similar to memory 26c in hub 26. For eadi current 
image 10c captured by camera 45f, image analysis element 45g will first lookup the predetemdned 
background im£^ of the playing venue, similar to 2r in Fig.'s 5a and 6, at the same precise pan and tilt 
angles, as well as zoom depA, of the current center-of-view 45f-cv. In so doing, analysis element 45g, will 
perform foreground image extraction similar to Steps 3 througjh 6, of Fig. 6, in order to create extracted 
blocks similar to lOe of Fig. 6. Note that the pre-stored background images, similar to 2r in Fig.'s 5a and 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-55" 

6, are first created by running system 200 prior to the presence of any moving foiegroimd objects. In ftis 
caKbration phase, system 200 will automatically direct each camera 45t in each station 40c, throughout all 
of its allowed angles and zoom depths. At each allowed angle and depth, a background image will be 
cq)tured and stored in the background image memory 45h ; that could be either con^uter memory or a 
hard drive. 

During this cahT)Tation phase, it is best that the venue ligjiting be substantially similar to that used during 
actual game play. Preferably, each camera 45f is also equipped witii a standard light intensity sensor that 
will c^ture the intensity of the ambient ligjit of each current image 1 Oc This information is tiien passed 
along with the current image, angles, and zoom depth to analysis element 45g. Hie li^ intensity 
information can then be used to automatically scale the hue and saturation, or brightness and contrast, of 
either the appropriately stored background image, such as 2r, or the currently c^tured image lOc In this 
way, if any of the venue hghting malfunctions or fluctuates for any reason during live fihning, than current 
image lOc can be automatically scaled to ^proximate the Ught intensity of 4e background image, such as 
2r, taken during the cahl>ration phase. 

Still referring to Fig. 11a, automatic filming stations such as 40c may optionally include coiiq>ression 
element 45i. This element may take on the form of a dedicated dap or a microprocessor, memory and 
software, hi any case, element 45i is responsible for converting either captured image stream 10c, or 
foreground extracted blocks lOe, into a further compressed format for both efBcient transmission and 
storage. It is anticipated that the irr5)lemented compression of game film as stored in databases 102 and 
202 could either follow the industry standard, such as the MPEG, or be inq)lemented in custom techniques 
as will be disclosed in the present and upcoming patent plications of the present inventors. 
Note that the present inventors also anticipate that the overhead tracking system 100 may operate its 
camera assemblies, such as 20c, at or about one hundred and twenty (120) fiames-per-second. In 
synchronization with assemblies 20c, the automatic game filming system 200 may then operate its camera 
stations, such as 40c, at the reduced rate of sixty (60) fi^mes-per-second. Such a technique allows the 
overhead tracking system 100 to effectively gather symbolic data stream lOys in advance of fihning 
camera movements, as directed by game filming system 200. Fur&ermorc, it is anticipated that ^^Me hubs 
26 of tracking system 1 00 will create symboUc stream 1 Oys at the higjier ftame rate, they may also discard 
every other extracted block from stream lOes, thereby reducing stream lOes's effective cq}ture rate to sixty 
(60) fiames-per-second, matching the filming rate. This approach allows for a finer resolution of tracking 
database 101, which is relatively small data storage requirements, while providing a video rate for storage 
in overhead image database 102 and game film database 202 that is still twice the normal viewing rate of 
thirty (30) ftames-per-second. This doubhng of video fiames in databases 102 and 202 allows for smoother 
slow-motion replays. And finally, the present inventors also anticipate that automatic game fi)mm^ system 
200 will have the dtynamic abiUty to increase the capture rate of filming camera stations 40c to mateh flie 
overhead assembhes 20c. Thus, as performance measurement & analysis system 700 determines that an 
event of greater interest is either currently occurring, or likely to occur, then ^jpropriate notification 
signals will be passed to automatic game filming system 200. System 200 will then mcrease the frame rate 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-56- 

from sixty (60) to one hundred and twenty (120) frames-per-second for each appropriate fihning station 
40c. Thus, automatic game fihn database 202 will contain c^tured film at a variable rate, dynamically 
depending upon the detected perftmnance of the sporting contest This will automatically provide extra 
video fiames for slow and super-slow motion replays of anticipated inqportant events in balance with the 
need to maintain smaller storage requirements for film databases 102 and 202. This concept is qjplicable 
regardless of the chosen fi:ame rates. For example, the overhead assemblies 20c could be operated at sixty 
(60) fiames-per-second, rather than one hundred and twenty (120), while the filmmg assemblies 40c, 
would be operated at thirty (30) frames rather than sixty (60). Or, conversely, the fiames rates used for 
exanq>le in this paragraph could have been doubled rather than halved, as stated in the previous sentence. 
Referring next to Fig. lib, there is shown the same elements as Fig- 11a with the additional depiction of 
two overhead tracking assemblies 20c-A and 20d-B simultaneously viewing Ae same area of the tracking 
sur&ce 2 as the perspective view game filming camera 40c. As previously discussed, automatic game 
filming system 200 m ain t ains continuous control and orientation tracking for each fihniag station 40c. 
Hence, the current center-of-view 45f-cv, for any given station 40c, is constantly known with respect to the 
local three-dimensional (X, Y, Z) coordinate system used within a given venue by the present invention. 
Based upon the center-of-view 45f-cv (X, Y, Z) coordinates, associated tracking system 100 can 
continuously determine which overhead tracking assemblies, such as 20c-A and 20c-B are filming in the 
tracking area overlying tiie game filming assemblies 40c's current and entire view. Furthermore, 
tracking system 100 can use the current images, such as lOc-A and lOc-B, the background images, such as 
2r-A and 2p-B, as well as the moving average / dynamic range image 2r-At and 2r-Bt of assemblies 20c- 
A and 20c-B respectively, in order to create a three-dimensional topological profile lOtp of any foreground 
objects within the current view of station 40c. As discussed previously and to be discussed fijrtfaer, 
especially in association with Fig. 14, tracking system 100 is able to effectively detennine the player, e.g. 
10-1, location and orientation. For instance, starting with the helmet sticker 9a on player 10-1, as located 
by both assembhes 20c-A and 20c-B, the tracking system 100 is able to calculate the ftree-dimensional (X, 
Y, Z) location of the sticker 9a's centroiA Furthermore, &om the downward view, system 100 is able to 
determine the helmet 9 shape outline 1 OsH as well as the body shape outline 1 OsB and the stick outline 
lOsS, as taught with Fig. 8. Using stereoscopic techniques well known to those skilled in the art, system 
100 can effectively create a topological profile lOtp of a player, such as 10-1, currently in view of a 
filming station, such as 40c. 

Referring next to Fig. lie, there is shown the same elements as Fig. lib with the additional depiction of 
topological projection lOtp placed m perspective as lOtpl and 10^2 aUgned with filming station 40c*s 
center-of-view 45fc. As will be understood by those skilled in the art, tracking system 1 00 as well as aU 
other networked systems as shown in Fig. 1 are enable of accepting by manual input and sharing a three- 
dimensional model 2b-3dnil of the tracking venue. Model 2b-3dml preferably includes at least tracking 
surfece 2 and surrounding structure dimensions (e.g. with hockey the boards and glass 2b.) Furthermore, 
the relative coverage locations of overhead views, such as 20v-A and 20v-B, as well as locations of all 
perspective filming cameras such as 40c and their associated current centers-of-view 45f-cv, are cahbrated 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-57- 

to this same tbree-dimensional model 2b-3dml. Thus, the entire calibrated dataset as tau^t by the present 
inventors provides the necessary information to determine exactly v/bat is in the view of any and all 
filming cameras, such as 20c and 40c, at all times. 

For the perspective filmmg cameras 40c, the current perspective view, such as lOcl, will only every 
contain one of two types of visual background information. First, it will be of a fixed backgroimd such as 
Area F as depicted in corresponding projection 10c2 of current view lOcl. (For the sport of ice hockey. 
Area F will typically be the boards 2b.) Or, second the visual information will be of a potentially moving 
background, such as Area M in corresponding projection 10c2 of current view lOcl. Fig. lie addresses the 
method by which the information collected and maintained in this calibrated database that associates exact 
venue locations to camera views, such as 20v-A, 20v-B and lOcl, can be used to effectively determine 
^en a perspective filming station, such as 40c, is cutrentiy viewing some or all of a potentially moving 
baclc^und area, such as Area M. This is inqportant since a background area such as Area M may 
potentially include moving spectators and is therefore more difiEicult to separate fix)m moving foreground 
of players, such as 10-1 and 10-2&2, using only the methods tau^t m association with Fig. 6a. 
Furthermore, Fig. 11c addresses how this same information can be used to create projections, such as 
lOtpl, of a foreground object, such as player 10-1 that partially overlaps a moving background such as 
Area M that is referred to as Area O and shouldnot be discarded. 

Still referring to Fig* lie, once the three-dimensional topological projection lOtp is created using 
information from two or more overlapping overhead camera assembties, such as 20c-A and 20c-B, current 
view lOcl may be broken into one of three possible visual information areas. As depicted in projection 
10c2 of current view lOcl, these three visual information areas are either Area O, Area F or Area M. Area 
O represents that portion(s) of the current image lOcl in which the topological projection(s) lOtp predicts 
the presence of a foreground object such as player 10-1. Area F represents that portion of the current image 
lOcl that is pre-know to overly the fixed background areas already identified to the tracking system 100 
and filming system 200 in three-dimensional model 2b-3dml. The extraction of foreground objects, such 
as player 10-1 &om these areas exactiy follows the teachings specificaUy associated with Fig. 6a as well as 
Fig.'s 9c, 9d and 9e. Area M represents that portion of the current image lOcl that is pre-known to overlap 
the potentially moving background areas already identified to the tracking system 1 00 and filming system 
200 in three-dimensional model 2b-3dml. 

The extraction of foreground objects, such as play^ 10-1, performed by image analysis element 45g of 
station 40c &om the portions of image lOcl corresponding to Area M, includes a first step of sixnply 
setting to null, or excludmg, all pkels contained outside of tiie intersection of Areas M and O. The degree 
to which the profile exactly casts the a foreground object's outline, such as player 10-1, onto tiie projected 
current image, such as 10c2, is effected by the amount of processing time available for the necessary 
stereo-scopic calculations. A processing power continues to increase, hubs such as 26 will have capabiHty 
. in real-time to create a smootii profile. However, hub 26 will always be limited to the two dimensional 
view of each overhead assembly, such as 20c-A and 20c-B. For at least this reason, image analysis element 
45g, will have an additional to perform after effectively discarding Area M. Specifically, those portions of 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-58" 

Area O tbat overlap the entire possible range of Area M must be additionally processed in order to 
eliminate likely moving backgromid pixels that have been included in Area O and is dq>icted as Region 
OM. The method for the removal of moving bacl!^;round pixels fiom Region OM includes a first step of 
eliminating any pixels that are outside of the pre-known base color tones lOct as previously defined in 
association with F^. 6b. Once these pixels have been removed, all rranaining pixels in Region OM are 
assured to be the in the possible color range for the anticipated foreground objects. The identity of the 
partic^>ant such as player 10-1 is ideally available to analysis element 45g during this first step so that the 
color tones lOct are further restricted to the £^ppropriate team or referee colors. 
A&es this initial removal of pixels outside of the partic^[>ant(s) color tone table lOct, all pixels in the 
Region OM are assumed to be a part of the foreground object and by design will appear to the observer to 
match the appropriate colors. A second step may also be performed in ^ndiichpre-captured and stored 
images of Area M, exactly similar to stored images of Area F are conq>ared to Region OM. Hiis is helpM 
in the case tiiat Area M may be either en^ty, or only partially filled with potentially moving objects, such 
as spectators 13. 

Referring next to Fig. lid there is shown a top view diagram depicting the view of perspective filming 
station 40c as shown in Fig.'s 11a, lib and 11c as it captures an image of a player 10-1. Also shown is 
topological projection lOtp in relation to top view of player 10-1 whose orientation is measured with 
respect to the center of view 45f-cv. As taught in Fig. 11a, filming station 40c ultimately receives images 
onto sensor 45s. In F^. lid, a pixel grid representing sensor 45s is shown with current image 10c2. (Note 
that current image 1 0c2 as shown is meant to exactly match the perspective view lOcl captured by 40c as 
shown in Fig. 11c.) 

Calculated projection lOtp has been overlaid onto current image 10c2 and is referred to as 10tp2. As 
previously discussed and as will be understood by those skilled in &e art, once the locations of the fixed 
overhead assembUes, such as 20c-A and 20g-B as shown in particular in Fig. 11c, are cahbrated to the 
fixed rotational axis of all perspective assembUes, such as 40c, then the calcidated profile 10^2 of 
foreground objects such as 10-1, in simultaneous view of both the ovediead and perspective assemblies can 
be assigned pixel-by-pixel to tiie current images, such as 10c2. This of course requires an understanding of 
the exact pan and tilt angles of rotation of perspective assemblies, such as 40c, about their calibrated fixed 
rotational axis, along widi the assembUes current zoom depth (as discussed especially in association with 
Fig. lla.) 

Still referring to Fig. lid, current captured image 10c2 can be broken into two distinct portions referred to 
as Area F and Area M. As discussed in relation to Fig. lie. Area F corresponds to that portion of the image 
whose background is known to be fixed (and generally considered to be within the ^^eld-of-play." 
Conversely, Area M conesponds to that portion of the image whose background is potentiaUy moving (and 
generally considered to be outside of the "field-of-play.) The movement within Area M is typically 
e;qpected to be due to the presence of spectators 13 (as depicted in Fig. lie.) The knowledge of the 
boundary lines between Area F and Area M is contained witiiin three-dimensional model 2b-3dm2. As 
will be understood by those skilled in tiie art, model 2b-3dm2 can be determined throu^ exact 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-59- 

measurements and pre-established with tracking system 100 and made available view network connections 
to filmin g system 200 and all associated systems depicted in Fig. 1. 

Referring next to Fig. lie, Acre is shown the same overhead view of filming station 40c as it views player 
10-1 that was first shown in Fig.'s 11a through lie. Now added to this top view is boards 2b just behind 
player 10-1. Shown further behind boards 2b are three spectators 13. Note that in hockey, the lower portion 
of the boards 2b are typically made of wood or conqwsite materials and is opaque, and are therefore a part 
of fixed background Area F. However, the upper portion of boards 2b are typically formed using glass 
panels held in place by vertical metal channels. Since it is possible fliat stations such as 40c will be fihning 
players such as 10-1 \^dule they are within view of tiiis upper glass portion of the boards 2b, then nearby 
spectators such as 13 may show within flie current view 1 0c2. As previously tau^ it is greatly 
beneficial to the overall com5)iiession of images, such as 10c2, that the foreground objects be extracted 
fiom any and all background image portions including visible spectators 13. Fig. 1 le shows that the side to 
side edges of player 10-1, which are contained in profile 10tp2, can delineate that portion of Area M that is 
expected to contain a foreground object, such as player 10-1. This foreground region is labeled as 10c- 
OM. Conversely, no foreground objects are expected to be found in that portion of Area M known to be 
outside of profile 10tp2 and is labeled as lOc-Mx. Hence, all pixels determined by use of pre-known three- 
dimensional venue model 2b-3dm2 to be within potentially moving background Area M and fiirdier 
detennined to be outside of foreground region lOc-OM can be set to null value and effectively ignored 
during analysis (as will be fiirflier illustrated in Fig. llf.) 

Referring next to Fig. llf, this concept is iUustrated in greater detail Specifically, image 10c2 as portrayed 
in Ffe. 1 le is first enlarged for discussion. Next, image 10c2 is broken into two portions based upon all 
pixels known to be in Area F, shown below 10c2 as 10c2-F, versus Area M, shovra above 10c2 as 10c2-M. 
Any foreground objects may be extracted firom image portion 10c2-F using techniques previously tau^ 
especially in relation to Fig. 6a. For image portion 10c2-M, the first step as first discussed in relation to 
Fig. 11c is to separate tiiat portion of the image that overlaps the topological profile 10tp2. This separation 
yields region OM labeled as 10c2-OM and shown separately above image portion 10c2-M. That portion of 
Area M not contained in region 10c2-OM is not expected to contain any foreground objects and is labeled 
as lOc-Mx and its pixels may be set to null value. And finally, after separating out region 10c2-OM, tfie 
second step is to use the color tone table, such as lOct shown in Fig. db, to examine each pixel in the 
region. Player 10-1 in region 10c2-OM is depicted to comprise four color tones CI, C2, C3 and C4. Any 
pixels not matching these pre-known color tones are discarfed by setting them to null. Thus only 
foreground pixels, along witii a muaimal amount of moving background pixels, will be extracted. These 
minimal amount of moving background pixels are expected to come fiom image segments such as 10c- 
OMx and represent colors on spectators 13 that match the color tone table lOct Using edge detection 
methods well known to those skilled in the arts, it is possible to remove some of the background pixels 
belonging to spectators 13 and matching color tone table lOct, especially if they come off of player 10-1 in 
a discontinuous manner. Whether or not these particular background pixels are fiiUy removed, the present 



CA 02563478 2006-10-16 

WO 2005/099423 PCT/US2005/013132 

-60- 

inventors antic^ate that their presence will represent relatively minor image artifects that will go largely 
unnoticed as game movement continues. 

Referring next to Fig. llg, there is shown the same oveihead view of fihning station 40c as it views player 
10-1 m front of boards 2b and spectators 13 that was shown in Fig. lie, Fihning station 40c is now 
referred to as 40&-A. Added to its right-side is stereoscopic perspective filming assembly 40c-B that 
functions exactly similar to any station 40c as previously described Station 40c-A and 40c-B are jointly 
mounted onto rack 40c.R. As wiU be q>preciated by those skilled in the art, the pan and tilt motions of 
assemblies 40c-A and 40c-B can either be integrated via rack 40c-.R or remain separately controlled while 
rack 40&-R remains fixed. The present inventors prefer a fixed rack 40c-R with separately controUed pan 
and tilt of assembUes 40c-A and 40c-B. hi either case, both assemblies 40c-A and 40c^B are operated to 
continually follow the center-of-play as predetermmed based upon overhead tracking information 
contained in tracking database 101. Each assembly 40c-A and 40c-B, as previously described for aU 
assembUes 40c, will have synchronized its image c^tures to a hmited number of allowed pan and tilt 
angles as weU as zoom depths. TheoreticaUy, since assembKes 40o-A and 40c-B are under separate 
operation and their movements, ^e similar, will necessarily not be identical it is possible that tiiey will 
not be capturing im^es as tiie exact same moment in time. The present inventors prefer an approach that 
fevors controlling tiie pan, tilt and zoom motions of 40c-A and 40c-B to ensure simultaneous c^ture. This 
will necessitate instances when both cameras are not identically directed towards the predetennmed center- 
of-play. However, as will be well understood by those skilled in the art, tiiese relatively minor "non- 
overlaps" will only affect flie edges of the resultant images 10c2-A and 10c2-B tiiat for other reasons such 
as perspective and inclusions were already less ideal for stereoscopic analysis. 

Still referring to Fig. llg, assembhes 40c-A and 40o-B capture simultaneous, overlapping images 10c2-A 
and lOclr-B respectively. Based uponpre-cahbrated information available m three-dimensional model 2b- 
3dm2, each current image 1 Oc2-A and 1 0c2-B is first broken mto Area F, contammg the know fixed 
background, and Area M, containing the potential moving background as previously taught Inside of Area 
M can be seen visible portions of spectators 13. Working in tandem with tiie fixed overiiead assembhes 
such as 200-A and 20c-B, each current image 10c2-A and 10c2-B is also overlaid witii topological 
projections 10p2-A and 10p2-B respectively. Each topological projection 10p2-A and 10p2-B defines 
Area O within images 10c2-A and 1 0c2-B respectively. Within each Area O are hnages 1 0-lA and 1 Q-IB 
of player 10-1 and smaU visually adjoining portions of background spectators 13. Selected visible portions 
of player 10-1, such as exterior edge point 10-l£e are simultaneously detected by stereoscopic assemblies 
40c.A and 40c-B as depicted as points 10-l£e-A and 1&-I£e-B m hnages 10c2-A and 10c2-B 
respectively. As is well known in the art, stereoscopic imaging can be used for instance to determine the 
distance between each assembly 40c-A and 40c-B and exterior edge point lO^lEe. For fliat matter, and 
distinctiy recognizable feature found in both images 10C2-A and 10c2-B that resides on a foreground 
object such as 10-1, can be used to determine the distance to tiiat feature and tiierefore player 10-1. The 
present inventors are aware of other systems attempting to use stereoscopic nnagmg as a primary means 



wo 2005/099423 



CA 02563478 2006-10-16 



PCTAJS2005/013132 



-61- 

for locating and tracking the positioning of players, such as 10-1. As is taught is this and prior related 
plications, the present inventors prefer using the overhead tracking system to detennine player location. 
The main purpose for the addition of stereoscopic assembly 40c-B as shown in Fig. llg is to provide 
additional information for edge detection along the perspective view of all foreground objects such as 10-1 
in Ae primary image 10c2-A, especially as they are extracted out of moving backgrounds with spectators 
such as 13. This additional information is depicted as moving background points 13-£&-A and 13-Ee-B. 
Specifically, background point 13-Ee-A will show just to the left of point 10-lFe-A within hnage 10c2- 
A. Similarly, point 13-£a-B will show just to the left of pomt 10-lFe-B wifliin image 10c2-B. Since 
these points are physically difiFerent, upon comparison, there is a probability that they will be different, 
especially when taken along the entire edge of foreground objects such as 10-1. Since point 10-lEe within 
images lOcZ-A and 10c2-B will show up witii highly similar color tone and grayscale con^nents, this 
dissimilarity between 13-Ee-A and 13-£e-B will be a strong indication of a non-foreground pixel, 
especially if either background pixels color tone is not in ttie list of pre-known tones as discussed in 
relation to Fig. 6b. Eurtiiermone, either of these points 13-Ee-A and 13-E&-B may match their respective 
pre-known background image pixels associated the current pan, tilt and zoom coordinates of tiieir 
respective assemblies 40c-A and 40c^B. This will also be a strong indication that the point is not a 
foreground pixel. Hence, in conibination with the pre-known backgroimds associated witii images 10c2-A 
and 10c2-B as taugjit espedaily with respect to Fig. 11a, this second offset stereoscopic image 10c2-B is 
anticipated to further help identify and remove moving background points such as 13-Ee-A fiom main 
image 10c2-A. 

Referring next to Fig. llh, flie present inventors depict in review flie use of topological profile 10p2-A to 
remove the portion of Area M outside the profile 10p2-A. Those pixels outside of Area O as d&^Gd by 
profile 10p2-A are set to null and ignored Also depicted in Fig. llh are exterior edge pomt 10-lEe-A and 
interior region edge point 10-lRe-A. While interior region point 10-lReA is along the edge of the 
foreground object such as player 10-1, it differs from exterior edge point lO^lEe-A this portion of the edge 
of player 10-1 not viewable or easily view &om the overhead assemblies such as 20c. Essentially, witiiin 
Area M, within topological profile 10p2-A, the edges including points sudb as 10-1R&-A cannot rely upon 
information fiom the overhead image analysis of tracking system 100 in order to help separate foreground 
fiom moving background pixels. 

Refening next to Fig. Hi, there is shown in review Region OM, a subset of Ar«a M as enclosed by 
topological projection 10p2-A. Within Region OM that contains primarily foreground objects such as 10- 
1, there is anticipated to be a small area along the edges of the captured image of player 10-1 that will 
spatially adjoin portions of the background spectators 13 that have not been removed via tiie profile 10p2- 
A; for instance, to tiie left of point 10-lEe-A. Since the topological profiles such as 10p2-A are calculated 
based upon the overhead view of tiie upper surfeces of players such as 10-1, it is possible that there will be 
sizable portions of Region OM that will contain background spectator 13 pixels. For instance, if fiom ttie 
perspective view of an assembly such as 40c-A, a player 10-1 's arm is outstretched in Region OM, flien the 
upper surfece will limit the depth to which flie calculated topological profile such as 10p2-A extends down 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-62- 

towards Area R This situation is e>qpected to occur frequently and will create larger portions of Region 
OM, shown as internal region lO-l-Ir-A, where moving baclc^und pixels may be visible in image 10c2- 
A, In the exanq)le of Fig. Hi, portions of spectators 13 can be viewed in image 10c2-A directly under the 
player 10-1 's outstretched arm but still above the top of Area F. These moving background pixels of 
spectators 13 ideally need to be separated from foreground image of 10-1 in an efiBcient manner. As will 
be understood by those skilled in the art, the capturing of stereoscopic image 10c2-B will provide slightiy 
skewed views of flie moving background such as spectators 13 behind foreground player 10-1. This 
skewing increases the probability that flie same spatially located pixel in images 10c2-A and 10c2-B will 
contain different portions of the actual moving backgrouiKl, such as spectators 13 or pre-known 
background The present inventors anticipate that the comparison of conq)anion stereoscopic pixels in 
image 10c2-B against those of 10c2-A during the standard edge detection will result in higher accuracy in 
less co^^)uting time. 

Referring next to Fig. 11 j, there is shown a top view drawing of tracking su&ce 2 surrounded by boards 
2b. Inside the playing area defined by tracking sur&ce 2 can be seen player 10^1 while outside are 
spectators 13. Perspective filming assemblies rack 40c-R as first shown in Fig, llg has been augmented to 
mclude third filming assembly 40crC, Similar to assembly 40c-B, assembly 40c-C collects stereoscopic 
images simultaneous to main filming assembly 40c-A. As was discussed in relation to Fig. llg through 
Fig. Hi, the use of additional stereoscopic assemblies 40c-C and 40c-B provides additional comparison 
pixels such as would represent spectator 13 points 13-Ee-C and 13-£e-B, respectively. This additional 
moving background information, especially in combination with pre-captured background images 
corresponding to assembhes 40c-A, 40c-B and 40c-C 's current pan, tilt and zoom coordinates, helps to 
remove unwanted moving background pixels. 

Also depicted in Fig. llj are additional angled overhead assemblies 51o-A through 51c-J that are oriented 
so as to capture fixed images of any potential moving backgroimd just over the edge of playing sur£^ 2 
and in the case of ice hockey boards 2b. Specifically, each angled overhead assembly such as 51g-B is 
fixed such that its perspective view 51v-B overlaps each adjacent angled overhead assembly such as 51c-A 
and Slc-C's perspective views as shown. Thus all angled oveihead assembhes such as 51o-B for a single 
contiguous view of the boundary region just outside of the tracking sur&ce 2. Preferably, each view such 
as 51 c-B is large enough to cover at least some portion of tracking surface 2 or in the case of ice hockey 
boards 2b. Furfliermore, each view should enconq)ass enough of the background so as to include any 
portions of the background any fiObning assembly such as 40c-A might potentially view as it pan, tilts and 
zooms. Therefore, assemblies such as 51&-B are set at an angle somewhere between that of the perspective 
filmin g assembhes such as 40c-A and a direcdy overhead tracking assembly such as 20c. 
Similar to techniques taught by tiie present inventors for overhead tracking assembhes such as 20c, each 
angled oveihead assembly such as 51c-B is capable of first capturing a badcground image corresponding to 
its fixed angled viewing area 51 v-B prior to flie entrance of any moving background objects such as 
spectators 13. During the ongoing game, as moving background objects such as 13 pass throu^ the view 
of a given overhead assembly such as 51c-B, using image subtraction techniques such as taught in relation 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-63- 

to Fig. 6a, the tracking system can detennine which bacI^;roi]nd image pixels now represent a moving 
background versus a fixed background. As will be understood by those skilled in the art, with proper 
calibration, overhead assemblies such as 51c-B can be mapped to the specific background images pre- 
captured by filming assemblies such as 40c-A that correspond to the same portions of the playing venue, hi 
practice, any given filming assembly such as 4Dc-A will have a limited panning range such that it 
eflFectively will not film 360 degrees around the tracking sur&ce. For instance, filmin g assemblies 40C-A, 
40c-B and 40c-C may only be capsble of panning tibrou^ backgroimds viewed by angled overhead 
assemblies 51c-A through 51c-G. Regardless of the exact mappings, what is inq>ortant is that the angled 
overhead assemblies such as 51c-B provide key additional information concerning the potential moving 
background area that may at times be in view of one or more filming assembHes such as 40c-A. 
By capturing this information continuously, a mapped database can be maintained between the angled 
images such as encompassed by view 51c-B and the stored pre-c^tured background images for the 
corresponding pani, tilt and zoom coordinates appropriate to each filming assembly such as 40c-A that is 
enable of viewing the identical area, hi some instances, as players such as 10-1 ^>proach tiiie edge of the 
tracking sur&ce, or in the case of ice hockey come against boards 2b, the views of angled overhead 
assemblies such as Slc^B will be partially blocked. However, due to their higher plac^ent angles, fixed 
assemblies 51c-B will always detect more of the moving background that perspective assemblies such as 
40c-A. Furthermore, as will be understood by those skilled in die art, since the fixed assembhes are 
constantly filming the same area as encompassed by views such as 51 c-B , they can fonn a model of the 
spectators 13 including their colors and shading. As will be understood by those skilled in the art, by using 
motion estimation techniques and preset determinations concerning tiie range of possible motion between 
image fimnes, blocked view of spectators can be adequately predicted thereby facilitating moving 
background pixel removal in Region OM. 

Referring next to Fig. 12, there is shown an inter&ce to manual game filming 300, ^t senses filmin g 
orientation and zoom depth information fix>m fixed m;*nii^l filming camera assembly 41c and m^m t^ins 
camera location & orientation database 301. Inter&ce 300 fiirther accepts streaming video fixim fixed 
assembly 41c for storage in manual game film database 302. During calibration, camera location & 
orientation database 3 01 is first i^xlated to include tiie measured (x, y, z) coordinates of the pan / tilt pivot 
axis of each fixed filming assembly 41c to be inter&ced. Next, the line-of-sigjbt 46f-cv of fixed camera 46f 
is determined with respect to the pan / tilt pivot axis. It is possible for the pivot axis to be the origin of the 
line of sight, which is the preferred case for the automatic filming stations 40c discussed in Fig. 11a. Once 
confirmed, this information is recorded in database 301. During a game, it is expected that fixed camera 
46f will be forcibly paimed, tilted and zoom by an operator in order to re-orient line of sigjit 46f-cv and 
therefore current image 10c. As camera 46f is panned, optical sensors, typically in the form of shaft 
encoders, can be used to detennine the angle of rotation. Likewise, as fixed camera 46f is tilted, optical 
sensors can be used to detennine the an^e of elevatioiL Such techruques are common and well understood 
in tiie art hitemational patent PCT/irS96/l 1 122, assigned to Fox Sports Productions, hic, specifies a 
similar approach for determining the current view of manual filming cameras at a sporting event By 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-64^ 

adding additional electronics to tiie zoom controls 46t on camera 46f, the zoom depth of the current image 
10c may also be detected Processing element 46a is responsible for taking the current pan and tilt readings 
along with the current zoom depth and updating image analysis element 46g that is constantly receiving 
current images 10c from fixed camera 46f via splice 46x. The first goal, as is similar to that purposed by 
Fox Sports in the aforementioned patent, is to simply record the detected viewing angle and depth for each 
acquired image 10c. This information becomes useful when atteirpting to determine what potential players 
and game objects were in the view of each individual manual-fihning camera similar to 46f . The system 
described in the Fox patent was only c^>able of tracking the movement of the game object, such as puck 3, 
and did not specify a solution for tracking players, such as 10. As such, it was primarily concerned with 
understanding wh^ the tracked game obj ec^ such as puck 3, was in the current manually captured image 
10c. The present invention furflier specifies the necessary qjparatus and methods for tracking and 
identifying individual players, such as 10, and referees, such as 12. It is anticipated that as manual game 
fihn is coUected, database 302 not only stores the individual firames 1 Oc but also tiie corresponding 
orientation and depth of the camera 46f field-of-view 46f-cv. Using this stored camera orientation and 
depth information, the tracking system 100 can determine Mviiich players and referees where in which 
camera views at any given moment System 100 is further able to determine of the visible players 10 and 
referees 12, what is their orientation with respect to each fixed camera, such as 46f; and therefore whether 
or not the current view 46f-cv is desirable. Automatic content assembly & compiGssion system 900 will 
use this information to help automatically select the best camera angles to be blended into its encoded 
broadcast 904. This mimics the current human based practice m which a producer views continuous feeds 
fiom multiple manual fihning cameras and tiien determines which views contain the most interesting 
players and camera angles for the currently unfolding game play. 

Also referring to Fig. 12, the present inventors anticipate modifymg tiie typical manually operated filming 
assembly, such as 41c so that it is panned, tilted and zoomed via an electronic control system as opposed 
to a manual force system. This concept is similar to the flight controls of a major aircraft whereby the pilot 
manually oper^ the yoke but is not physically connected to flie plane's rudders and flsps. This "fly-by- 
wire- approach uses the yoke as a convenient and femiliar form of "data ii^ur for the pilot As the pilot 
adjusts the yoke, the motions are sensed and converted into a set of control signals that are subsequently 
used to automatically adjust the plane's flying control mechanisms. In a similar view, tiie present invention 
anticipates inq}lementing a "fihn-by-wire" system for manually controlled assemblies, such as 41c. This 
approach will allow for die operator to, for instance, move a joystick and view the camera fihn through a 
monitor or similar screen. As movements are ir^ut through flie joystick, ttie processing element sends the 
necessary signals to automatically adjust the camera's position via panning and tilting motors as well as 
electronic zoom control. This is similar to the automatically controlled stations 40c specified in Fig. Ha. 
With this approach, the manual fihning camera is also modified to only wq>ture images 1 Oc at allowed pan 
/ tilt angles and zoom depths, again similar to automatic filming stations 40c. Image analysis element 46g 
is then able to recaO pre-c^tured and stored background images fiom memory ^ 
current camera orientation and depth As was tau^t for automatic filmmg stations 40c, this technique of 



wo 2005/099423 



CA 02563478 2006-10-16 



PCTAJS2005/013132 



-65- 

limiting current images 10c to those with matching background images provides a means for greater video 
conqpression by caaq)ressor 46i that uses the backgrounds to extract mini Tnfll foreground information as 
discussed in Fig. 6a. 

Referring next to Fig. 13, there is shown an interGu^e to manual gan^ie fihning 300, that senses fihning 
orientation and zoom depth information &om roving manual filming camera assembly 42c and TnaintaiTis 
camera location & orientation database 301. Laterfece 300 fiirther accepts streaming video ftom roving 
assembly 42c for storage in manual game fihn database 302. Camera location & orientation database 301 
is first updated during cahl>ration to include the measure (x, y, z) coordinates of predetermined Une-of- 
sigjit 47f-€V of each roving filming camera 47f to be inter^ced Each cameras line-of-sight 47f-cv will be 
predetermined and associated with at least two transponders 47p-l and 47p-2 that are attached to roving 
camera 47f . As will be understood by tiiose skilled in the art, various technologies are either available or 
coming available that allow for accurate local positioning sj^tems (LPS.) For instance, radio frequency 
tags can be used for triangulating position over short distances in the range of twenty feet Newer 
technologies, such as Time Domain Coiporation's ultra-wide band devices currently track transponders up 
to a range of approximately three hundred feet Furthermore, companies such as Trakus, Inc. have been 
working on microwave based transmitters to be placed in a player's helmet that could alternatively be used 
to tracking the roving camera assembUes 42c Regardless of the LPS technology chosen, transponders 47p- 
1 and 47p-2 are in communication with a multiplicity of tracking receivers, such as 43a, 43b, 43c and 43d, . 
that have been placed throughout flie area designated for nro vement of the roving cantiera assembly 42c. 
Tracking receivers such as 43a through 43d are in communication with transponder trackmg system (LPS) 
900 that calculates individual transponder coordinates based jxpon feedback from receivers, such as 43a 
through 43d. Once each transponder 47p-l and 47p-2 has been individually located in the local (x, y, z) 
space, than the two together will form a line segment parallel to the line-of-si^t 47f-cv widiin camera 47f . 
Coincident with tins determination of line-of-sigjit 47f-cv, the electronic zoom of camera 47f wiD be 
augmented to read out the currently selected zoom depth. This information can flien either be transmitted 
thrcu£^ one or both transponders 47p-l and 47p-2 or be transmitted to via any typical wiiel ess or wired 
means. Togedier with the line-of-si^t 47f<v, the current zoom setting on camera 47f will yield the 
e;q)ected field-of-view of current image 10c. 

Also depicted in Fig. 13, there is shown two overiiead tracking assemblies 20c-A and 20c^B each with 
fields-of-view 20v-A and 20v-B, respectively. Using the combination of information derived by tracking 
system 100, namely the relative locations and orientation of players, such as 10-1 and 10-2, as well as the 
determined field-of-view 47f-cv of roving camera 42c, system 900 can ultimately determine which players, 
such as 10-1, are presently in view of w\nch roving cameras assembUes, such as 42c This information 
aids system 900 as it automatically chooses the best camera feeds for blending into encoded broadcast 904. 
Referring next to Fig. 14, there is shown a combination block diagram depicting the player & referee 
identification system (using Jersey numbers) 500 and a perspective drawing of a single player 10. Player 
10 is within view of multq)le ID camera assemblies, such as 50c-l, 50c-2, 50c-3 and 50c-4, preferably 
spread througjiout die perimeter of the trackmg area. Also depicted is a single representative overhead 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-66- 

tracking assembly 20c with overhead view 20v of player 10. Using overhead views such as 20v, tracking 
system 100 is able to determine player 10*s current location lOIoc and orientation lOor with respect to a 
preset local coordinate system. Location lOIoc and orientation lOor aiB then stored in tracking database 
101. Using this and similar information ftom database 101, ID camera selection module 500a of 
identification system 500 is able to select an individual ID camera assembly, such as 50c-l, that is best 
positioned for a clear line-of-sight of the back of a player's lO's jersey. Selection module 500a maintains a 
database 501 of the current camera location & orientation for each ID assembly such as SOc-1 throu^ 50c- 
4. Each assembly, such as 50c-l , comprises an ID camera similar to 55f under direct pan, tilt and zoom 
motor control as weU a shutter control from a processing element 55a, similar to automatic fihning stations 
40c. This element 55a ensures tiiat flie shutter of canwra 55f is only activated when both the lamps 
providing ambient ligjit are discharging and the camera S5f is at an allowed pan, tilt and zoom setting. 
Using pre-known information regarding typical hehnet 9 dimensions and player 10 sizes, the captured 
images 55c are automaticaUy cropped by image analysis element 55g to fonh a minimal image 503x in 
which the player's jersey number and name are e3q>ected to reside. This minimal image 503x is transmitted 
back to pattern matching and identification module 500b for pattern matching with flie pre-known set of 
jersey backs stored in database 502. Similar to automatic fihning assemblies 40c, id assemblies, such as 
50C-1, are capable of pre-c^turing and saving backgrounds, similar to 2r, shown in Fig.s 5a and 6, fiom 
aUowed limited pan and tilt angles as weU as zoom depths for ID camera 55f. Hence, minimal image 503x 
can be fiirther hmited to only foreground image pixels after elimination of the background using 
techniques similar those shown in Fig. 6a. 

Pattern matching and identification module 500b uses pre-known jersey images & (associated) players 
database 502 in order to conduct standard pattem matching techniques, as are well known in the art Note 
that tiie player & referee identification system 500 is only necessary if the hehnet stickers such as 9a are 
not being used for any reason (such as would be the case in a sport like basketball where players, such as 
10, are not wearing hehnets, such as 9.) When used, system 500 is e3q)ected to receive images such as 503i 
off of selected players, such as 10, at the maximum capture rate designed for id camera assemblies, such as 
50C-1. For exan^le, this may yield between 30 to 60 minimal images 503x per second. In practice, the 
present invention is expected to only perform jersey identification of a player, such as 10, wbm that player 
either first enters the view of tracking system 100 or merges views with another player. Furthermore, it is 
ejqjected that singjle humping into o&er players, such as 10, or even players clumping together, such as 
10-2&3 (shown in previous figures,) will still not cause tracking system 100 to loose the identity of any 
given player. Hence, once identified by tiiis jersey pattem match method, the player 10*s identity is tiien 
fed back to the tracking database 101 by identification module SOOb thus allowing tracking system 100 to 
sin:q)ly follow the identified player 10 as a means for continuing to track identities. When tracking system 
100 encounters a situation wbsro two or more players, such as 10-2 and 10-3, momentarily merge such that 
they are no longer individually discemable, then when these same players are determined to have 
separated, system 100 wiU request that identification system 500 reconfirm their identities. In such as case. 



CA 02563478 2006-10-16 
WO 2005/099423 PCT/US2005/013132 

-67- 

tracking system 100 will provide a list of the players in question so that identification module 500b can 
limit its pattern matching to only those jersey's worn by the unidentified players. 
Referring next to Fig. 15, there is show two Quantum Efficiency Charts for a typical CMOS sensor 
available in the commercial marketplace. Specifically, the upper chart is for tiie part number ????, a 
Monochrome sensor sold by flie Fill Factory of Belgium; while the lower chart is for their Color sensor, 
part number ???. With respect to the Monochronte Chart 25q-M, it is important to note that the sensor is 
primarily designed to absorb fiequencies ta the visible spectrum ranging fiom 400 imi to 700 nm, wherQ its 
quantum efficiency peaks between 500 nm and 650 nm. However, as is evident by reading Chart 25(|-My 
this sensor is also caipsblc of significant absorption in the near IR range fiom 700 nm to 800 nm and 
beyond. In this near IR region, the efficiency is still roughly 60% of the peak. Although not depicted in 
Chart 25q-M, the monochrome sensor is also responsive to the UVA fiiequendes below 400 imi with at 
least 40% to 50% of peak efficiency. As will be discussed in more detail with reference to Fig*'s 16a, 16b 
and 1 6c, the Color sensor as depicted in Chart 25q-C is identical to the monochrome sensor excepting that 
various pixels have been covered with filters that only allow restricted fi^quency ranges to be passed. The 
range of fiequencies passed by the filter are then absorbed by the pixel below and determine that individual 
pixel's color sensitivity. Hence, pixels filtered to absorb only '*blue li^rt" are depicted by the leftmost peak 
in Chart 25q-C that ranges from ^proximately 425 nm to 500 run. Similarly, pixels filtered to absorb only . 
'^green Hghf are shown as tiie middle peak ranging fiom 500 nm to 600mn. And finally, the rightmost 
peak is for **red light" and ranges fi^om 600 rmi to roughly 800 nm. The present inventors taught in prior 
^Ucations, of wbich the present plication is a continuation, that it is beneficial to match non-visible 
tracking energies emitted by surrounding hght sources with special non-visible, or non-visuaOy apparent 
coatings, marking in^rtant locations on players and equipment, along with die absorption curves of the 
tracking cameras. This matching of emitted non-visible ligjit, widi non-visible reflecting marks and non- 
visible absorbing sensors provided a means for tracking specific locations on moving objects wi&out 
creating observable distractions for die participants and spectators. The present invention will expound 
upon these teachings by showing the ways in which these non-visible tracking energies can be effectively 
intermeshed widi the visible energies used for filming. In this way, a single view, such as 20v or 40f-cv, of 
the movement of multq)le objects can be received and efTectively separated into its visible filming and 
non-visible tracking conqxments. 

Referring next to Fig. 16a, there is depicted a typical, unmodified 1 6 pixel Monochrome Sensor 25b-M. 
Bach pixel, such as 25p»Ml , is capable of absorbing light fiequencies at least between 400 nm to 900 nm 
as depicted in Chart 25q-M. Referring next to F^. 16b, there is depicted a typical, unmodified 16 pixel 
Color Sensor 25b-C. £ach *Uu&^ pixel, such as 25p-B, is capable of absorbing li^t fi^equencies primarily 
between 400 nm to 500 nm. Each "green" pixel, such as 25p-G, is enable of absorbing light fiequencies 
primarily between 500 nm to 600 nm while each "red" pixel, such as 25p-R, absorbs primarily between 
600 imi to 800 nm. Referring next to Fig. 1 6c, there is show a novel arrangement of pixels as proposed by 
the present inventors. La this new Monochrome / IR Sensor 25b-MIR, every oflier pixel, such as 25p-M, is 
filtered to absorb fiequencies primarily between 400 nm to 700 nm (rather flian to 800 nm), ^^Me the 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-68" 

remaining pixels, such as 25p-IR are filtered to absorb primarily between 700 nm to 800 am. The resulting 
sensor 25b-MIR, is then capable of alternately being processed as a visible light monochrome image that 
has advantages for image analysis as taught especially in Fig. 6a, and a non-visible light IR image that will 
yield information concerning specially placed non-visible markings on either the players or their 
equipment The resulting intenneshed monochrome / IR image offers significant advantages for image 
analysis as will be further discussed in the specification of Fig. 17. 

Referring next to Fig. 16d, there is depicted a standard RGB double prism that is typically used to separate 
Ae red, green and blue fiequencies of light so that they can then be directed to three distinct imaging 
sensors. This configuration is often found in commercially available 3-CCD cametas. hi the present 
configuration, fight ray 25r passes througji lens 24L and is first refiacted by prism 24P-1 . This refiaction is 
designed to separate the frequencies ranging fiiom 400 mn to 500 nm away fifom ray 25r, thereby forming 
ray 25r-B (blue fight) that is then reflected off tiie back of lens 24L, through sensor lens 24L-1 onto 
monochrome sensor 25b-Ml. TTie remaming portion of ray 25r passes through prism 24P-1 and is then 
fiirther refiracted by prism 24P-2. This second refraction is designed to pass tiie firequencies from 500 rrai to 
600 nm through as 25r-G (green hgjit) while separating the firequencies firom 600 nm throu^ 800 mn off 
as 25r-R (red and near-IR fight). Ray 25r-G continues througji sensor lens 24L-2 onto monochrome sensor 
25b-M2. Ray 25r-R is subsequently reflected off the back of prism 24P-1, through sensor lens 24L-3 onto 
monochrome-IR sensor 25b-MIR. This configuration provides many benefits including, 1- the abifity to 
process the image in fiiU color, with maximum pixels per red, green and blue, 2- the abifity to precisely 
overlay and inteipolate the color images in order to foim a monochrome image, and 3- the abifity to detect 
reflections of flie non-visible IR tracking energy due to the unique construction of the monochrome-IR 
sensor 25b-]VnR. The benefits of tiiis arrangement wifi be further described m the specification of Fig. 17. 
Referring next to Fig. 16e, a variation of the typical two-prism lens system commerciafiy available for 
separatmg red, green and bhie frequencies. SpecificaUy, this second prism is removed and the angles and 
reflective properties of the first prism are adjusted, as is understood by those skified in the art, so tiiat the 
frequencies of 400 nm to 700 mn, represented as ray 25r-VIS (visible fight), are separated from flie 
fiiequencies of 700 nm and higher, represented as ray 25r-IR (near IR), In this configuration, visible fight 
ray 25t~VIS passes Arough prism 24P and continues through sensor lens 24L-2 onto color sensor 24b-C. 
Near IR ray 25r-IR is subsequently reflected off the back of lens 24L and througjh sensor lens 24L-1 onto 
monochrome sensor 25b-M. This resulting configuration requires one less sensor flian the arrangement 
taught m Fig. 16d while stiU providing both a color image (also monochrome via interpolation,) and an IR 
image for detecting reflections of the non-visible IR tracking energy. This arrangement wifi exhibit less 
color fidefity since the visible fight frequencies for 400 nm throu^ 700 rmi are detected by a single sensor, 
rather than the three sensors specified Fig. 16d. The present inventors prefer using a commerciafiy 
avafiable product referred to as a "hot mirror" as the single prism 24P. These *1iot minors,'* as sold by 
companies such as Edmund Optics, are specificaUy designed to reflect away the IR fi:equencies above 
700nm when afigned at a 45* angle to the oncormng figjit energy, Theh traditional purpose is to reduce the 
heat bmldup in an optical system by not aUowing the IR fiiequencies to enter pass through into tiie 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-69- 

electronics. This non-traditional use of the "hot mirrof * as the prism in a two lens system will provide ihe 
novel benefit of creating a color image of the subject matter with a simultaneous, overlapped JR. image in 
which "non-visible" markings can be discerned. 

Referring next to Fig. I6f, there is depicted the same lens, prism sensor arrangement as described in Fig. 
16e except that visible ray 25e-VIS passes through sensor lens 24L-2 onto a monochrome sensor 25b-M 
rather than a color sensor 25b-C. This configuration offers tiie advantage of directly providing a 
monochrome image, tiiat is often preferred for machine vision applications, without the processing 
requirements associated with interpolating a color image to get the monochrome equivalent, thereby 
allowing for fester image processing. Note that the image is still alternately available in flie overlapped IR 
view via ttie monochrome sensor that receives ray 25r-IR through lens 24L-1. Furthermore, the **hot 
mirror^ discussed in Fig* 16e is also equally ^licable to Fig. 16f. 

Referring next to Fig. 17, there is shown tiie diree fundamental steps being taught in the present invention 
for. first, extracting foreground objects such as players 10-1 and 10-2&3; second, searching extracting 
objects in the intermeshed non-visible frequencies such as IR, in order to best locate any specially placed 
marldngs similar to 5; and third, creating a motion point model as taught in prior ^Tplications by the 
present inventors. Specifically, referring to Step 1 in Fig. 17, there is shown the extracted player images 
ICV-l and 10-2&3. The preferred extraction process is exactly similar to tibat described in Fig, 6a which is 
readily performed using fixed cameras such as overhead tracking assemblies 20c as depicted in Fig. 5a, 
For perspective filming, the present invention teaches the use of automatically controlled filming 
assemblies such as 40c in Fig. 11a. These assemblies 40c are built to fecilitate foreground extraction by 
limiting image capture to allowed angles of pan and tilt as well as zoom depths for which prior background 
images may be pre-c£q)tured, as previously described. Whether using overhead assemblies 20c or filming 
assemblies 40c, after the con^>letion of Step 1 , those pixels determined to contain the foreground object 
such as 10-1 and 10-2^, will have been isolated 

in Step 2, the equivalent extracted foreground pixels are re-exanuned in the non-visible fiequency range 
(e.g, IR,) such as would be available, for instance, by using sensor 16c directiy, or multi-sensor cameras 
such as depicted in 16d, 16e and 16f. As the equivalent IR inoage pixels are examined, those areas on the 
foreground object where a non-visibly apparent, tracking energy reflecting surface coating 5 has been 
affixed are more easily identified. As shown in Step 3, the located tracking energy reflective marks 5r can 
tiien be, translated into a set of body and equipment points 5p that themselves can be later used to 
regenerate an animated version of the players and equipment as taught in prior related appUcatxons, 
Referring next to Fig, 18, there is shown a single player 10 witiiin Ihe view of four camera assemblies, 
each wifli its own distinct purpose as previously taught and herein now summarized First, there is 
overhead tracking camera assembly 20c, whose purpose is to locate all foreground objects, such as 10, 
within its overhead or substantially overhead view 20v. Once located, images collected by assemblies 20c 
will be analyzed to determine player 10 identity through the recognition of special markings such as helmet 
sticker 9a on hehnet 9, Images ftom assembhes, such as 20c are also used to locate the game object, such 
as a puck 3 for ice hockey. The combination of player 10 and game object 3 location information 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-70- 

detennined by analysis of the oveibead images is subsequently used to automatically direct filming camera 
assemblies, such as 40c. Filming assemblies, such as 40c, are controlled so that they will only capture their 
images at allowed pan & tilt angles as well as zoom depths. This control allows for tihe "pre-c^ture" of 
images of the background at all possible angles and depths thus forming a database of tracking sur&ce aiui 
area backgrounds that are used to ^cilitate the e£&cient extraction of player 10 images from the filming 
images. la addition to location information, die images Gcom die tracking assemblies such as 20c, also 
provide the orientation of individual players 10. Hiis orientation information, along with the player lO's 
location, are then used by jersey identification assembUes 50c to zoom in on the appropriate portion of the 
player 10 where their identifying markings, such as a jersey number and player name, is expected to be 
found This process results in jersey id pattern images 503x that can tiien be matched against a 
predetermined database of pattern images in order to identify a given player within an acceptable 
confidence level. 

And finally, the orientation and location of player 10 is used to direct three-dimensional model filming 
assembhes 19c (shown in Fig. 18 for the first time.) There are several options for the specific construction 
of assemblies 19c whose purpose is to collect visible light images, such as lOc-M of player 10 intermeshed 
or concurrent overlying with non-visible unages, such as lOc-lR. Note that assembly 19c may include 
its own additional tracking energy source, such as IR ring light 19rl, that emits non-visible tracking energy 
for the better illumination of non-visible player markings, such as 5 on player 10. As intermeshed or 
concurrent overl^>ping images such as 10&-M and lOc-IR are continuously analyzed, the process of 
locating in^rtant player 10 body-points, which are indicated by maikmgs such as 5, it is greatly 
^dlitated since the search may be limited to only those pixels determined to be in the foreground. As 
previously taught, this is enabled through the control of pan & tilt angles as well as zoom depth on model 
filming assemblies 19c, similar to game filming assemblies 40c. This control &cilitates gaining pre- 
knowledge concerning the background that leads to efGcient image foreground extractiorL Knowing &e 
player lO's orientation, also help analysis of non-visible markings in image lOc^IR since it provides 
logical inferences as to which body-points are likely to be in view thereby limiting the determination steps. 
All assemblies, 20c, 40c, 50c and 19c are synchronized to the environment lighting via the power lines that 
drive this lighting. This synchronization ensures ma yfmiiTn and consistent ambient lighting widi images are 
C£^tured. Assemblies 19c are also similariy synchronized to any added tracking energy emitting lan:q>s. 
Referring next to Fig. 19, thoie is depicted a typical youth ice hockey rink that is being used to teach the 
gathering of spectator audio and video database 402 that can dien be combined with the overhead images 
102, automatic game film 202 and manual game film 302 in order to create a more con:q>lete encoded 
broadcast 904, as diown in Fig. 1. Spectators to be filmed, such as parent 13-1 and 13-2 as well as coach 
11, are first provided with transponders 410-1, 410-2 and 410-3 respectively. As will be imderstood by 
those skilled in die art, various technologies are either available or coming available diat allow for accurate 
local positioning systems (LPS.) For instance, radio frequency tags can be used for triangulating position 
over short distances in the range of twenty feet Newer technologies, such as Time Domain Corporation's 
ultra-wide band devices currently track trai^ponders up to a range of ^jproximately three hundred feet 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-71- 

Furthermore, companies such as Trakus, hic. have been woridng on microwave based transmitters, such as 
9t to be placed in a player's, such as 10-6, hehneL Any of these various types of transmitters could also be 
used to track key spectators such as team coaches 11 or the parents 13-1 and 13-2. Regardless of the 
technology chosen, transponder tracking system 900 will gather location information from receivers such 
as 43a, 43b, 43c, 43d, 43e and 43f strategically placed througjiout the surrounding tracking area. Receives 
such as 43a througji 43f will receive signals fix>m transponders such as 410-1, 410-2, 410-3 and even 9t 
thereby providing data supporting the triangulation and location of each transponders. This location 
information will typically be calculated from ten to thirty times per second and stored in the spectator 
tracking database 401. 

Spectator tracking and fihning system 400 then uses spectator location information from database 401 to 
automatically direct movable, controUable spectator filming cameras such as 60-1, 60-2 and 60-3. 
Spectator fihn i ng cameras are attached to individual or continuous rail 62 thereby &ciHtating controlled 
side-to-side movement of cameras such as 60-1, Camera 60-1 is attached to rail 62 via motorized swivel 
and extension arm 61 that is enable of panning and tilting, as well as raising and lowering camera 60-1 . 
Movement instructions are provided by system 400 via wireless link 60L. While the bandwidth required to 
transmit movement instructions is anticipated to be minimal, the subsequent download of video from the 
camera 60-1 to system 400 will requhe higher bandwidflis. Given these increased bandwidth requirements, 
die present inventors prefer implementing the link 60L in a technology such as Time Domain 
Corporation's ultra-wide band (UWB,) It is also possible that camera 60-1 communicates with system 400 
via traditional network cable. In addition to spectator video information, it is also desirable to collect 
ambient sound recordings. These audio recordings can be used by content assembly & coiiq)ression system 
900 to blend directly with the captured game and spectator fihiL Alternatively, system 900 may use at least 
die decibel and pitch levels derived from tiie recorded ambient audio to drive the overlay of syn&etic 
crowd noise. Hence, the overlaid synthetic crowd noise would ideally be a frmction and multiple of the 
actual captured spectator noise, thereby maintaining accuracy M^e added excitement Audio c^ture 
devices 72 accept sound through microphones 73 and then transmit this information to system 400 for 
storage in the spectator AA^ database 402. Additionally, spectator fihning cameras such as 60s3, that are 
anticipated to be focused on eidier coach 1 1 or players such as 10-8 in the team bench, may optionally be 
outfitted with zoom microphone 60m, Such microphones are capable of detecting sound waves generated 
within a small area 6om a long distance, as will be understood by those skilled in the art 
Also depicted in Fig. 19 is coach's event clicker 420. This wireless device at a minini^ mi includes a single 
button that may be depressed any time throughout the ensuing game. Each of many possible chckers, such 
as 420, is uniquely encoded and pre^tched to each team coach, such as 11. This allows each individual 
coach to create time markers associated witii their name to be used time segment the captured game fihn 
along with the events objectively measured and determined by performance measurement & analysis 
system 700, Hence, each time a coach, such as 11, depresses the appropriate button on the event clicker 11, 
then cUcker 11 generates a unique signal combining an electronic indication of the button(s) depressed and 
that clicker's 11 identifying code. Receivers such as 43a through 43f are capable of detecting these 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-72- 

transmitted signals from clicker 1 1 after wMch they are passed onto perfonnance measuiement & analysis 
system 700 fliat automatically includes each transmission as a detected game event In this way, a coach 
such as 11, naay instandy recall game film from either the overhead or perspective angles as stored in 
databases 102 and 202 respectively, sinq)ly by selectihg their designated marker based upon its recorded 
time code. And finally. Fig. 19 also shows an additional placement of automatic filming assemblies, such 
as 40c discussed in relation to Fig. 11a. This placement of filming assembly 40c essentially "within the 
boards," allows for various "interest shots** of the game as opposed to more traditional game film views. 
For exan^le, assemblies 40c placed at lower filming levels can be used to c^ture the movement of 
player's feet as they enter the ice or to make '"ice level" film of activity in front of the goal-tender. The 
point of such film, similar to the reason for c^turing spectator fihn, is to add to the story line of the 
encoded broadcast 904 by mixing in novel film shots. 

Rjeferring next to Fig. 20, there is dq>icted a typical scoreboard 650 that would be found in a youth ice 
hockey rink. A parent or rink en^>loyee 613 usually controls scoreboard 650 via scoreboard iiq>ut device 
630. For the present invention, it is desirable to c^ture ofScial game start and stop times as well as referee 
indications of penalties and game scoring. U.S. Patent number 5/293354, for a Remotely Actuated Sports 
Timing System, teaches "a remotely actuatable sports timing system (that) automatically responds to a 
whistle blown by the sports official to generate a frequency modulated radio signal \^ch is utilized to 
provide an ins ta ntane ous switching signal to actuate the game clock." This system is predicated on the 
ability of a microphone, worn to the referee, to pick up the sound of a blown whistie that is typically 
generated in a pre-known frequency such as 3150 hertz. Upon proper detection, a radio transmitter 
cormected to the microphone transmits a radio signal that is picl^ iqp by a receiver, electronically verified 
and then used to stop the of&cial game clock. 

The present inventors suggest an alternative ^roach that includes airflow detecting ^^diistle 601, with 
pinwfaeel detector / transmitter 601a. As referee 12 blows into \^^usde 601 creating airflow through the 
inner chamber and out the exit hole, pinwheel 601a is caused to spin. As pinwheei 601a spins, a current 
flow is induced by the rotation of the pinwheel shaft as will be understood by those skilled in the arts. This 
current is then detected and used to initiate the transmission of stop signal 605 that is picked vp by receiver 
640. Receiver 640 then transmits signals to scoreboard control system 600 that is coimected to scoreboard 
650 and automatically stops tiie game clock. Since each pin\*dieel 601a resides inside of an individual 
referee's whistle, it is capable of positively detecting only one referee's airflow, and therefore the 
indication of the activating referee such as 12. Hence, with the presently taught whistle 601, by encoding 
each pinwheel 601a with a unicpie electronic signature, control system 600 can detemoine the exact referee 
lhat initiated the clock ston>age providing additional valuable information over the aforementioned 
external microphone detector approach* 

Note that witii the aforementioned Remotely Actuated Sports Timing System, it is possible for one referee 
to blow his whistle causing sound waves at the pre-known frequency that are then picked up by more than 
one radio transmitter worn by one or more other game ofScials. Therefore, this system is not reliable for 
uniquely identifying which referee initiated the clock stoppage by blowing their ^utotie. A finlher 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-73. 

difiBculty of this unique fiequency / soimd approach is that referees are not always consistent in the airflow 
that fliey generate through their whistle. For the present inventors, pinwheel 601a will be calibrated to 
detect a wide range of airflow strengths, each of which could generate a sligjitly, or significantly different 
sound frequency. This difference will be inmiaterial to the present invention but may be problematic to 
detection by remote radio transmitters. 

An additional advantage taught by the present inventors occurs for the sport of ice hockey that designates 
the starting time of tiie game clock when the referee 12 drops the game puck 3. In order to automatically 
detect the dropping of the game puck 3, pressure sensing band 602 is designed to be worn by referee 12; 
for instance over his first two fingers as depicted. Band 602 includes on its underside, piessuie sensing 
area 602b that is capable of detecting sustained force, or pressure, as would be catised by the grasping of 
puck 3 by referee 12. Sensing area 602b is connected to electronics and transmitter 602c that first sends 
"on** signal to LED 602a when sufQdent pressure is detected, thereby allowing referee 12 to visually 
confirm that the puck is "engaged" and "ready-to-drop.^ Once puck 3 is released, sensing area 602b 
changes its state causing electronics and transmitter 602c to emit start signal 606 that is picked up by 
receiver 640. Receiver 640 flien transmits signals to scoreboard control syst^ 600 that is connected to 
scoreboard 650 and automatically starts tiie game clock. Since each pressure sensing band 602 is womby 
an individual referee, it is only enable of detecting the "engage / puck drop*' of that referee thereby 
providing unique identification. By encoding each band 602 with a unique electronic signature, control 
system 600 can determine the exact referee that initiated the clock start 

In Fig. 20, whistle 601 and band 602 are shown as a single integrated device. Hie present inventors 
andcipaiG that these may be separate devices, as would be the case if they were worn in different hands. 
Furthermore, it is possible to use band 602 with the existing whistle technology that aheady exists in the 
marke^lace without departing Scorn the teachings concerning the detection of clock start time. Other 
additional uses exist for control system 600 includmg the ability to accept information firom a game ofScial 
during a clock stoppage such as but not limited too: I) player(s), such as 10, involved in scoring, 2) type of 
game infi:action, and 3) player(s), such as 10, involved in game infiraction and their penalties. System 600 
is connected via a traditional netwoik to tracking system 100 such that the exact start and stop clock times 
as well as other oflScial information can be provide and synchronized with the collected game film and 
performance noeasurements, all of which is eventually incorporated into encoded broadcast 904. 
Furthermore, tracking system 100 is able to detect the exact time of any goal scoring event such as a puck 
3 entering the net area, a basketball going through a hope or a football crossing a goal line. In all cases, the 
event that was detected by image capture and determined through image analysis will be stored in the 
performance measurement and analysis database 701 along with its time of occurrence. In flie case of ice 
hockey and football, these detected events will be used to initiate a game clock stoppage by sending ihe 
appropriate signals to system 600. For at least the sport of ice hockey, after receiving such signals, system 
600 wiU not only stop the game clock on scoreboard 650, but it will also automatically update the score 
and initiate appropriate visual and audible cues for the ^)ectators. Such cues are e;q7ected to include 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-74- 

turning on the goal hasp and initiating a selected sound such as a scoring horn thiou^ a connected sound 
system. 

Referring next to Fig. 21, there is depicted a block diagram showing the overall flow of information, 
originating with the actual game 2-g, splitting into subjective and objective sensory systems and ultimately 
ending up in a result conq)arison feedback loop. Starting with the events of the actual game 2-g, subjective 
information is traditionally determined by coaching staff lis. Staff lis will retain mental observations 
made during the contest 2-g, and depending upon the organization, will potentially write down or create a 
database of game assessments llga. This recording of observations by staff lis is typically done some 
time after the conclusion of game 2-g. Such assessments my typically be commmiicated to database llga 
throng a computing device such as a coach's l^top or PDA. It is often the case that game film, such as 
databases 102 and 202 is taken of game 2-g so that staff lis can review this film at a later point to provide 
additional assurance as to their assessments llga. (Currentiy, game film is only available via manually 
operated filming systems that, at a youtii level, are typically a made by parent with a video recorder.) 
The present invention specifies a tracking system 100 that both films and simultaneously measures game 
2-g. Tracking system 100 further automaticaQy directs automatic game filming system 200 that is capable 
of collecting game film such as 102 arul 202. SyniboHc tracking data determined by tracking system 100 is 
analyzed by performance measurement & analysis system 700 to create objective performance 
measurements 701a. Performance assessment module 700a then ^lies an expert system of game 
interpretation rules to create objective game assessments 701b fiom objective performance measurements 
701a. Data fiom subjective assessments llga and objective assessments 701b may then be electronically 
conq>ared, creating for exan^le difference report 710. Report 710 may then be reviewed by coaching staff 
lis as a means of refining tiieir game perception and analysis. Furtibermore, the electronic equivalent of 
report 710 may also provide feedback to the performance assessment module 700a that may then use this 
information to reassign wei^ting values to its expert systems rules. It is further anticipated that 
conq^arison information sudi as provided in report 710 will be invaluable for the process of further * 
developing meaningful objective measurements 701a and game interpretation rules. 
Referring next to Fig. 22, there is shown a series of perspective view representations of the overall method 
embodied in the present application for the capturing of current images such as 10c, the extraction of the 
foreground objects such as lOes, and the transmission of these mfnimal objects lOes to be later placed on 
top of new backgrounds with potentially inserted advertising such as 2r-c3-1016a. Specifically, Step 1 
depicts the capturing of current image 10c by perspective filming station 40c Current image 10c includes a 
background made \xp of tracking surf^ 2 and boards and glass 2b as well as multiple foreground objects 
such as puck 3, player 10-1 and players 10-2&3. In Step 2, the current pan and tilt angles as well as zoom 
depth coordinates 40c-iit^l016 of &e station 40c at the time image 10c was taken, are used to select a 
matching target background, such as 2r-c3-1016 fluough 2r-c3-1019. hi Step 3, the target background, 
such as 2r-c3-1016 is used to isolate any foreground objects such as puck 3, player 10^1 and players 10- 
2&3. The specific methods fiir this extraction were taught primarily with respect to Fig/s 6a and 11a 
through llj. The teachings surrounding F^. 6a primarily cover the subtraction of badcgrounds especially 



wo 2005/099423 



CA 02563478 2006-10-16 



PCTAJS2005/013132 



-75- 

from fixed ovediead assemblies such as 20c and 51c wtile the teaching of Fig. 11a through llj 
additionally show how to handle the separation of potentially moving backgrounds, e.g. spectators from 
perspective assemblies such as 40c. In either case, tfie end result of Step 3 is tiie creation of extracted 
foreground data blocks lOes that are the minimum portions of image 10c required to represent a valid 
broadcast of a ^rting event. 

Referring still to Fig. 22, in the next Step 4, extracted foreground data blocks lOes are transmitted along 
with pan / tilt / zoom coordinates 40c-ptz-1016 identifying the particular ''perspective" of the fihnmg 
station 40c when this extracted data 1 Oes was c^tured This inforntiation is then transferred, for instance 
over tiie hitemet, to a remote system for reconstmction. The present inventors anticipate that due to the 
significant reductions in the original dataset, i.e. 10c, as taught in the present and related inventions, 
multiple views will be transmittable in real-time over traditional high-speed connections such as a cable 
modem or DSL. These views include a conq>lete overhead view created by combining die extracted blocks 
lOes fiiom each and every overhead assembly, such as 20c.^ Also included are perspective views such as 
those taken by station 40c Furthermore, the present inventors anticipate significant benefit to alternatively 
transmitting die gradient image, such as is shown as lOg in Fig. 6b as opposed to fte actual image shown 
as extracted block lOe. The gradient lOg will serve very weD for the ovediead view and will take up 
significantly less room tiban the actual image lOe. Furdiermore, this gradient image may then be 
"colorized" by adding team colors based upon the known identities of the transmitted player images, 
hi any case. Step 5 includes the use of the transmitted pan / tilt / zoom coordinates 40c-ptz-1016, Le. 
"1016," to select the ^propriately oriented target background image such as 2r-c3-1016a from the total 
group of potential target backgrounds such as 2r-c3-1016a through 2r-c3-1019a. Note that this set of 
target backgrounds to select torn, e.g. 2r-c3-1016a through 2r-c3-1019a, is ideally transmitted from the 
automatic broadcast system 1 to the remote viewing system 1000 (as dqpicted in Fig. 1) prior to the 
commencement of the sporting contest Many possibihties exist in tiiis regard First, these target 
backgrounds can be supplied for many various professional rinks on a transportable noedium such as CD or 
DVD. Hence, a youth game fihned at a local rink would tiien be extracted and reconstmcted to look as if it 
was being played in a professional rink of the viewer's choice. Of course, these target background images 
2r-c3 could be transmitted via Internet download. What is inq>ortant is that they will reside on tiie remote 
viewing system 1000 prior to the receiving of flie continuous flow of extract foreground object lOes 
movement horn one or more angles. This will result in significant savings in terms of the total bandwidth 
required to broadcast a game which will be especially beneficial for hve broadcasts. Furthermore, the 
present inventors anticipate using existing grJ5)liics animation technology, such as that used with current 
electronic sports games such as EA Sports NHL 2004. This animation could automatically recreate any 
desired backgroimd to match transmitted pan / tilt / zoom coordinates 40c-pt^l016 for each received 
extracted foreground block lOes, thereby eliminating tibe need to pre-store "real" backgroimd images such 
as the set of target backgroimds 2r-c3-1016a through 2r-c3-1019a. 

Still referring to Fig. 22, it is a finther anticipated benefit of the present invention that advertisements may 
be eidier overlaid onto the target background images 2r-c3 prior to their transmission to flie remote 



wo 2005/05^9423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-76- 

viewing system 1000, or they may be automatically synthesized and overlaid by programs running on the 
viewing system 1000 that are also responsible for subsequently overiaying the extracted foreground stream 
lOes. Hiis approach significantly improves upon current techniques that do not first separate the 
foreground and background and therefore must overlay advertisements directly onto a current image such 
as 10c. Furthermore, the current state of the art therefore also transmits the entire current image including 
background and overlaid advertisements, if any. 

And finally, after the appropriate target background image such as 2rHr3-1016a is either selected from a 
pre-stored database or fiilly or partially synthesized via traditional con^)uter animation techniques, the 
foreground stream lOes is placed onto the selected / recreated background in accordance with the 
transmitted minimum bounding box comer coordinates lOes-bc. Within the overlaid extracted blocks lOes, 
any null or similarly denoted "non-foreground" pixels are replaced with the value of the associated pixel 
with selected target background image 2r-c3-1016a. The resulting image 1 Ic is then presented to the 
viewer. 

Referring next to Fig. 23, there is shown on the left Stream A lOc-db. This first Stream A lOc-db depicts 
eights individual fuU-J&ames, such as lOc-FOl, 10c-F06, lOc-Fll toough 10&-F36, that are fiom a series 
of thirty-six original current images lOc. These images 10c were either c^tured by an assembly such as 
20c or 40c or constructed &om the multq>le views of the overhead assembly matrix 20cm (depicted in Fig. 
3) as taught in the present q>plicatioiL Current state of the art systems work with fiill frame series such as 
Stneam A lOc-db when providing their sports broadcast Such streams are typically first reduced in size 
using the industry standard MPEG conqjression methods. As is known by those skilled in tiie art, MPEG 
and similar techniques are faced with having to conqpress fuU-frame images such as 10c-F06 as a function 
of the pixel information contained at least in tiie full-j&ames proceeding 10c-F06, such as lOc-FOl through 
lOc-FOS (not depicted.) This process of fiame-to-firame cross con^>arison and encoding is time consuming 
and not as effective as the present invention for reducing the final transmitted image size. 
Still referring to Fig. 23, next to full-fi^e Stream A lOc-db is shown sub-fi:ame Stream B lOes-db. Each 
sub-fiame, such as lOc-esOl, 10c-es06, lOc-esll through 10c-es36 represents just tiiose portions of a given 
full-fi:ame current image 10c tfiat contain one or more foreground objects. Note that in any given current 
image 10c, zero or more distinct sub-fiames such as lOc-esOl may be present (hi Fig. 23, each current 
image contained exactly one sub-fiame although this is neither a restriction nor requirement) Each sub- 
firame comes encoded with the coordinates, such as (rl, cl) and (r2, c2) defining its appropriate location in 
the original current fi^me 10c. These coordinates are one way of designating tiie location of the 
bounding box, such as lOe-1 shown in Fig. 7a. Other encoding methods are possible as will be understood 
by those sldlled in the art What is iiiq)ortant is that the present inventors teach an apparatus and method 
for extracting moving foreground objects bom either fixed or moving backgrounds and transmitting this 
minimal sub-fiame dataset, for instance lOc-esOl throu^ 10c-es36, along with coordinate information 
such as comer locators (rl , cl ) and (r2, c2) necessary to place the sub-fiame into a pre-transmitted target 
background as previously discussed in Step 6 of Fig. 22. 



wo 2005/0^423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-77- 

And finally, also dq)icted in Fig. 23 is a third series of tracked motion vectors lOy-db corresponding to 
successive image siib-fiames. For instance, after first sub-frame lOc-esOl, the pafli of the detected 
foreground object follows the vector 10-mv06. These vectors are meant to represent either some or the 
entire larger database of tracking inforaoation 101 as first shown in Fig. 1. Hence, the present inventors not 
only teach the transmission of a minimized data Stream B lOes-db of extracted foreground object blocks, 
such as 10-es06, they also teach tiie simultaneous transmission of motion vector 10^mv06 and related 
digital measurement information. Such digital measurement information, as taugjit in flie present invention, 
provides significant potential for quantifying and qualifying participant performance providing statistics, 
analysis and is a basis for automatically generated "synthesized audio" commentary. 
Referring next to Fig. 24, there is shown the same two Streams A and B of fiill firames 1 Dc-db and sub- 
franfies lOes-db, respectively, as depicted in Fig. 23. In this figure, both fuU-firame Steam A lOc-db and 
siib-fiame Stream B lOes-db are shown in a perspective view meant to visualize a data transmission flow. 
As stated with reference to Fig. 23, Stream A lOc-db is most often first conq)ressed using methods such as 
those taught in association with industry standard MPEG. As can be seen by the portrayal of Stream A 
lOc-db, its overall size prior to conqjression is both the maytmpin and significantly greater flian sub-fiame 
Stream B lOc-db. As wiU be discussed later in relation to Fig. 25, there is no limitation restricting sub- 
frame Stream B lOes-db from also being similarly con^ressed by traditional methods such as MPEG. 
However, before any such coiiQ>ression takes place, the present inventors prefer altering Stream B lOes-db 
so that it is no longer in its original variable bandwidth format as shown in Fig. 24. Specifically, each sub- 
firame such as lOc-esOl, 10c-es06, lOc-esll througji 10c^es36 may take any fiill or partial portion of 
origmal corresponding images, such as lOc-FOl, 10c-F06, lOc-Fll througjh 10c-F36, respectively. Hence, 
while each transmitted full-frame in Stream A lOc-db is originally of tiie same size and dierefore easily 
registered to one another for co^^)ression, each transmitted sub-fi:ame in Stream B lOes^b is neither of 
the same size nor easily registered. This transformation from variable bandwidth sub-frame Stream B 1 Oes- 
db into rotated and centered fixed bandwidth sub-frame Stream Bl lOes-dbl is discussed in relation to ' 
upcoming F^. 25 and was first taught in relation to Fig.'s6d and 6e. - 

Referring next to F^. 25, there is shown first the same variable bandwidth sub-finame Stream B lOes-db as 
depicted in Fig. 24 next to a corresponding rotated and centered fixed bandwi dlh sub-frame Stream B 1 
lOes^l. Specifically, each sub-frame of Stream B lOes-db is first evaluated to determine if it contains 
one or more identified participants such as a player 10. hi the sin^lest case, where each sub-fi:aine contains 
a single identified player 10 based upon hefanet sticker 9a, that sub-firame may be rotated for instance such 
that the player's helmet sticker 9a is always pointing in a pre-designated direction; depicted as Step 1. In 
general, this will tend to orient die player lO's body in a similar direction &om sub-fiame to sub-frame. It 
is anticipated that this similar orientation will fecilitate known firame-to-fianoie conqwession techniques 
such as MPEG or the XYZ method, botii well known in the art Note that tfiis rotation fecilitates 
conq>ression and requires tiie transmission of tiie rotation angle to flie viewing system, such as 1000, so 
fliat the decoiiq)ressed sub-frames can be rotated back to their original orientations. 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-78. 

Furthennore, fliis rotation concept is most easily understood with respect to extracted foreground blocks 
such as lOc-esOl taken Scorn oveihead images 10c as c^tured fiom assemblies such as 20c. However, 
similar concepts are possible with respect to foreground object blocks extracted ftom perspective view 
images 10c as csqptured from assemblies such as 40c. Hence, players 10 viewed &om the perspective can 
still be aligned &cing forwards and standing up based upon the orientation infonnation gathered by the 
tracking system 100. For instance, if a series of perspective-view sub-fiames show a given player skatiag 
back towards his own goal, than these images could be flipped vertically making the player appear to be 
&cing the opponent's goal. The present inventors anticq)ate that such alignment may fedlitate greater 
con^ression when processed by existing methods especially those like XYZ that fevor "slower moving,** 
hig}ily aligned objects. 

Referring still to Fig. 25, in order to make a fest moving obj ect such as a hockey player 1 0 skating a full 
speed appear to be a slow moving object (ie. with respect to &e background and image frame cent^, such 
as a person standmg in a teleconference,) tiie present inventors teach the method of centering each original 
stib-firame, such as lOc-esOl into a carrier firame lOc-esCF. This is shown as Step 2. In this way, a higUy 
regular succession of video frames is created for conqiression by traditional methods, again such as MPEG 
or preferably XYZ, as will be understood by those skilled in die art The resultant minimum "motion** 
between frames off of the centered axis lOes^bAx provides a highly con^iressible image file. As was 
taught first in relation to Fig.s 6d and 6e, it is also desirable to zoom or e^cpand individual extracted sub- 
frames such as lOc-esOl so that the overall pixel area of each aligned player remains roughly the same 
from frame-to-fimie, thereby fricilitating traditional compression methods. This will require that zoom 
setting also be transmitted per sub-frame. 

Referring next to Fig. 26, there is shown first the same rotated and centered fixed bandwidth sub-fi:ame 
Stream Bl lOes-dbl as d^icted in Fig. 25 next to a corresponding Stream B2 10es-db2 \riiose individual 
sub-frames have been "scrubbed" to temove all detected background pixels. As previously discussed 
especiaUy in relation to Step 6 of Fi^. 6a, afrer the extraction of foreground blocks such as lOc-esOl, 10c- 
es06, lOc-esll and 10c-es36, these blocks are then examined by hub 26 to remove any pixels determined 
to match the pre-stored bacl^groimd image. The result is scrubbed extracted blocks such as 10c*^01s, 10c- 
es06s, lOc-eslls and 10c-es36s respectively. These ''scmbbed" sub-frames are more highly compressible 
using traditional techniques such as MPEG and XYZ. 

With respect to Fig. 22 Ihrougih Fig. 26, the present inventors are teaching general concepts for the 
reduction in the video stream to be broadcast By reducing the original content via foreground extraction 
and then by rotating, centering, zooming and scrubbing the extracted blocks as diey are placed into carrier 
frames, the resulting stream 32 10e5-db2 as shown in Fig, 26 is significantly smaller in original size and 
more compressible in format using traditional methods well known in the art Hiese techniques require the 
embeddiag of operating infonnation into the video stream such as the rotation angle and zoom &ctor as 
well as tiie image ofi&et to carrier frame lOc-esCF axis lOes-dbAx When combined witii the pan and tilt 
angles as well as zoom depths of perspective filming assemblies such as 40c and flie location of fixed 
oveihead assemblies such as 20c all with respect to three-dimensional venue model 2b-3dinl, the present 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-79- 

inventioa teaches new methods of video stream conqmession tiiat goes beyond the state of the art 
Furthermore, embedded information can include indicators defining the sub-frames as either containing 
video or gradient image information. In a dynamic con^ression environment, the automatic broadcast 
system 1 is anticipated to switch between the total number of video feeds transmitted as well as the basis 
for representation, i.e. video or gradient, on a frame by frame basis as the available transmission bandwidth 
fluctuates. Additional techniques as taugjht with respect to Fig. fib allow finlher compression beyond 
traditional methods by recognizing the Kmited number of colors ejqjected to be present in a foreground 
only video stream. Hence, rather than encoding a potential 256 shades of red, blue and green for each pixel 
so as to be able to represent any possible color, the present invention teaches the use of a smaller 4, 16 or 
32 combination code where each code represents a single possible color tone as known prior to the 
sporting contest 

Several exception situations to tlrcse methods are anticq>ated by the present inventors. For instance, a given 
sub-frame will often contain more than one partic^ant or player such as 10. Depending upon the detected 
overlap as determinable for both the overhead and perspective views based upon the player orientation in 
the tracking database 101, the present inventors prefer automatically "cutting" the sub-frame along a 
calculated Ime best separating fee known centers of the two or more visually co-jomed players. Each time 
a sub-frame is split, it simply becomes a smaller sub-frame with its own bounding box comers (rl, cl) and 
(r2, c2). It is immaterial if any given sub-frame contains only portions of a main player 10 along with sub- 
portions of visually overlapping players smce ultimately all of the sub-frames will be reset to their original 
locations within the final viewing frame 11c shown in Fig. 22. 

Also, there are anticipated advantages for creating a carrier frame lOc-esCF of preset dimensions. One 
preferred size would be one-quarter the size of a normal friU-frame. The presetting of the carrier frame 10c- 
esCF dimension could be beneficial for the application of traditional image corrpression methods such as 
MPEG and XYZ, In this case, the present inventors anticq)ate that the sub-fi:ames will not always "fif * 
within the carrier frame lOc-esCF and must therefore be split Again, tiiis less frequent need for splitting 
larger sub-fiames to fit smaller carrier flames will not effect the ultimate reconstmction of final viewing 
firame 1 Ic. It is flirther anticipated feat the size of die carrier firame lOc-esCF can be (tynamically changed 
to fit the zoom depth and therefore the e}q)ected pixel area size of foreground objects, such as player 10. 
Hence, since the overhead assemblies 20c have fixed lenses, individual players 10 will always take up 
roughly the same mmaber of image flame 10c pixels. Jn this case, the present inventors prefer a carrier 
flame that would alvrays mclude some multiple of the expected size. For the perspective filmmg 
asseniblies such as 40c, the size of a player 1 0 is anticipated to vary directiy proportional to the known 
zoom depth. Therefore, the present inventors anticipate dynamically varyiiig the size of the earner frame 
lOc-esCF as a function of the current zoom value. Note 4at in a single broadcast that includes multiple 
game feeds such as Stream B2 IQes^bl, it is anticipated tiiat each feed will have its own dynamically set 
variables such as the carrier flame lOo-esCF size. 

The present inventors anticipate significant benefit with the transmission of the gradient image lOg first 
shown m Fig. 6 a after it has been translated into some form of a either a vector or boimdary encoded 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-80- 

descriptioiL Hence, the gradient images lOg, like the full color extracted foreground images lOes can either 
be represented in their original bitm^ form, or tfiey can be converted to other forms of encoding well 
known in the art Two tone, or "line art" images, such as the gradient image lOg are ideal for 
representation as a set of curves, or b-splines, located in the image space. The gradient images lOg could 
also be represented using y/hst is know in the art as a chain code, essentially tracing tiie pixel-by-pixel path 
around each line of the gradient image. At least the conversion to b-splines and the representation of a 
bitm^, or raster image, as a vector image is well known in the art and especially used in both JPEG and 
MJPEG standards. The present inventors anticipate die these spatial con^iression methods way prove more 
advantageous to the conqxression of both the gradient image lOg and extracted foreground images lOes 
that more traditional temporal conq>ression methods such as motion estimated specified for MPEG and 
XYZ. More specifically, die present inventors teach the extraction of foreground objects and tiieir 
conversion into separate color tone regions, were each separated region is diercfore more like a gradient 
image lOg. Each region can either be defined by a linked list of pixel locations, or a chain code, or by a set 
of b-splines. Regardless of the method for describing the exterior boundary of the region, its interior can be 
represented by a single code denoting the color tone contained with that regiorL Depending upon the pixel 
area contained within the region as conqiared to fee length of the perimeter boundary describing the 
region, this conversion to vector or coded method can oflFer significant bandwidth savings. The final stream 
of region locations and contained color tones can then be automatically reconstructed into video-like 
images for placement onto the properly selected or recreated backgroimds as summarized in Fig. 22. 
Referring next to Fig. 27, there is shown a perspective view of a filming assembly 40c as it captures 
background images 2r (first depicted in F^, 6a) &om the venue prior to filming a five event with a moving 
foreground such as players 10, or moving background such as spectators 13. As was previously taught, 
especially in Fig.'s 11a though llj and summarized in Fig. 22, these bacli^und images 2r are associated 
with the capturing assembUes current pan, tilt and zoom coordinates 40c-ptz and stored separately in a 
single database per assonbly 40c. Fig. 27 illustrates the concept of storing the net combination of each of 
these individual background images 2r, that may highly overlap, into a single background panoramic 
database 2r-pdb associated with a given assembly 40c and that assembhes fixed view center 40c-XYZ. As 
will be understood by those skilled in the art, the determination of the three dimensional pC, Y, Z) 
coordinates of the axis of rotation of sensor 45s within assembly 40c, provides a method of calibrating 
each pixel of each image 10s captured This cahbmtion will relate not only to the venue's three- 
dimensional model 2b-3dbni2 but also to the overhead tracking assemblies such as 20c For reasons that 
will be e7q)lained in association with iq>coming Fig. 28, the zoom depth of assetnbly 40x is preferably set 
to the m a ximum when collecting this panoramic database 2r-pdb, and as such the only variables depicted 
in Fig. 27 are for the pan and tilt angles. 

Specifically, each assembly 40c will be controUably panned and tilted througjiout a predetermined 
m a ximum range of motion that can be expressed as angular degrees. In practice, die present inventors 
anticipate that the maxiTnum pan range will be less than 1 80° n^iiile the nmYinrnm tUt range will be less 
than 90°. Regardless, as was previously taught, sensor 45s (shown as a grid in e}q>anded view) will be 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



^81- 

restricted to capturing images at increments of a minimum pan angle Dp and minimum tilt angle Dt 
Therefore, every background image 2r captured by sensor 45s will have a unique pan coordinate of Dp = n 
* Dp, where n is an integer between 1 and Xp such that Dp > 0° and typically < 180° Similarly, every 
background image 2r captured by sensor 45s will have a unique tilt coordinate of Dt = m ♦ Dt, where m is 
an integer between 1 and Xt such that □ t > 0** and typically < 90°. 

Still refeiring to Fig. 27, at any given set of (m, n) pan / tilt coordinates, sensor 45s will be exposed to 
some portion of the fixed background that me be at any depth 6om the assembly 40c, such as surfeces 2r- 
sl, 2r-s2 and 2r-s3. Typically, these surfeces are Gxpecied to be in the range fiom 5' to 300* away from 
assembly 45s. (For the purposes of maintaining a single maximum zoom during this step of collecting 
background images 2r for the construction of panoramic background database 2r-pdb, it is preferable that 
the all potential surfeces be in focus throughout the entire range &om tfie nearest to farthest distance. This 
requirement will at least dictate the ultimate position of assembly 40c so as to fix the distance to the closest 
surfece to be greater than some minimum as determined by the assembly camera lens options, as will be 
understood by those skilled in the art) The varying distance to each surfece, 2r-sl, 2m2 and 2r-s3 will 
result in a difTering sur&ce area c^tured onto any one given pixel of sensor 45s, Hence, tiie &rther away 
the surfece, such as 2r-s3 versus 2r-4l, the larger a sur&ce area each single pixel such as 45sPx will 
represent e.g. 45sP3 versus 45sPl respectively. It is anticipated that the fixed background in the venue 
will not change in any significant way between mitial calibration and the filming of multiple games over 
time. However, if the fixed background is expected to change, tiian the creation of panoramic database 2r- 
pdb may need to be updated accordingly. 

Regardless of the actual badcground surfiice area viewed, for each single pixel 4SsPx cq>tured by sensor 
45s and for all allowed pan Dt and tih angles Dt of assembly 40c, preferably pixel 45sPx's RGB or YUV 
value is stored in panoramic database 2r-pdb. Witiun database 2r-pdb, each pixel such as 45sPx is 
addressable by its (m, n) coordinates, e.g, (m = angle 447 * Dt and n = angle 786 * Dt as shown.) As 
previously stated, each c^tured pixel 45sPx will represent a given sur&ce area such as 45sPl, 45sP2 or 
4SsP3 on depth varied sur&ces 2r-sl, 2rs2 and 2r-s3 respectively. As will be understood by those skilled 
in the art, depending upon the repeatability of the pan and tilt control mechanisms with respect to the 
minimum pan and tilt angles 40-ptz, each time that assembly 40c returns to the same coordinates 40-ptz, 
pbcel 45sPx will capture the same physical area 45sPl, 45sP2 or 45sP3. In practice, the present iirventors 
anticipate that, especially v/hen using maximum zoom depth, or when considering the sur&ces ferthest 
away such as 2r-s3 giving area 45sP3, the repeated pixel information will not be exact This is further 
expected to be the case as the venue undergoes small "imperceptible" physical changes and or experiences 
different Ugjiting conditions during a game versus the initial calibration. However, within a tolerance, as 
will be understood in the art, these small changes can be characterized as background image noise and 
dealt with via techniques such as interpolation with neig^oring pixels to always yield an average pixel for 
45sPl, 45sP2 or 45sP3 ^ch can be stored as 45sPv rather tfian the actual C£^tured value 45sPx. 
Furthermore, especially when working in the YUV color domain, fluctuations in vaaue lighting can be 
addressed with a larger tolerance range than is necessary for the UV (hue saturation. Of 



wo 2005/0S^9423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-82- 

Still referring to Fig. 27, as assembly 40c sweeps in any direction, a single pixel such as 45sPx will move 
across the sensor array 45s. As will be understood by those skilled in the art, due to image distortion 
caused by the optics of the chosen lens, the actual background image area, such as 45sP3, 45sP2 and 
45sPl may not be identically c^tured for aU successive increments of movement Hence, at any given 
depth such as 2r-s2, due to image distortion the actual sur&ce area such as 45sP2 captured per each pixel 
of sensor 45s, such as 45sPx versus 45sPy versus 45spz, will not be identical It will be understood that 
pixels radiating outward from the middle of sensor 45s will tend to c^ture progressively larger potions of 
the image sur&ce. Therefore, pixel 45sPx will have less distortion and will capture less actual siu:£ice area 
such as 45sP2 than will pixel 45sPy. In turn, pixel 45sPy will have less distortion and will c^ture less 
actual sur&ce area such as 45sP2 than will pixel 45sPz. This distortion will limit the number of captured 
pixels such as 45sPx from sensor 45s that can reliable be used to build initial panoramic database 2r-pdb. 
This is because during hve filming via assembhes 40c, although each current image 10c ^>reviously 
depicted) will be captured only at allowed minimum increments of pan and tilt angles Dp and Dp, it is 
unlikely that any given c^tured image will be at exactly the same pan tilt (and zoom) coordinates 40c-ptz 
for which a single original background image was centered Therefore, as the pre-stored background is 
extracted from the panoramic database 2r-pdb for subtraction from the current image 10c, the individual 
background pixels such as 45sPv may represent sHghtly varying portions of the venue backgroimd. This 
would be especially tme where the current image 10c pixel is towards the outermost portion of the image 
sensor 45s, such as 45sPz, ^^hereas its corresponding pixel such as 45sPV in database 2r-pdb was an 
iimennost pixel such as 45sPx. 

The present inventors prefer using three main approaches to handling this background image distortion 
beyond choosing appropriate optics configurations for minimum distortions, as will be understood by those 
skilled in the art. First, each c^tured backgroimd image 2r, whose individual pixels will contribute to pre- 
stored background panoramic 2r-pdb, can be transformed via a matrix calculation to remove image 
distortion as is weU known in the art Hence, by use of standard lens distortion correction algorithms, the 
background image captured by sensor 45s can better approximate a fixed surfece area per pixels such as 
45sPx, 45sPy and 45sPz. Note that when the background image 2r is extracted from panoramic database 
2r-pdb for subtraction from a current image lOe, die transformation matrix can be reapplied so as to better 
matoh the effective distortion in the current image 10c pixels. The second ^proach can be used either 
alternatively, or in combination with the use of a transformation matrix as just described What is preferred 
is that the actual pixels used from sensor 45s for each initial backgroimd image 2r captured to build 
database 2r-pdb, are limited to those with acceptable distortioiL Hence, only the "interior" pixels such as 
diose clustered near the center of sensor 45s, for instance 45sPx be used to build database 2r-pdb. 
Obviously, the fewer the pixels used, all the way down to only a single central pixel, the geometrically 
proportionately more total background images 2r must be captured to create panoramic database 2r-pdb. 
The third approach preferred by the present inventors for minimizing image distortion in the panoramic 
background database 2r-pdb, is to cq)ture this original database in a zoom setting at least one multq)le 
higher than the highest setting allowed for game filming. As wiU be understood by those skilled in &e art. 



wo 2005/0S^423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-83- 

ttus is essentially over-sampling the venue background, where over-sampling is a common technique for 
removing signal noise, in this case representing by pixel distortion. For instance, ideally each captured and 
stored pixel, such as 45sPv in database 2r-pdb, will be at least 1/9^ of the size of any pixel captured in a 
current image, as will be depicted in the upcoming Fig. 28. 

Refening next to Fig. 28, there is shown a similar depiction of a perspective view of filming assembly 40c 
c^turing images of background sur&ces such as 2r-sl, 2r-s2 and 2r-^. What is different in Fig. 28, is 
that these image c^tures are meant to represent the live filming stage ra^er dian the calibration step of 
building the panoramic background database 2r-pdb. Furthermore, what is shown is the effect of zooming 
on the correlation between and given pixel in the current image cq>tured on sensor 45s, such as 45sFx, and 
the corresponding pixels in the panoramic database 2r-pdb. Essentially, for a given sur&ce depth such as 
2r-43, an origmal background pixel representing sur&ce area 45sF3 was captured and saved as 45sPy in 

Hfltabflse Ir-pdh. When frinriing, the tnayiinnTTi rnoTn Heptfi is ptieferahly limited tn nZ-3 * Dt in the 

vertical direction and 02 = 3 Op in the horizontal direction. Hence, during filming, assembly 40c will 
never be directed to zoom in closer than nine times the area of the originally captured backgroimd pixels; 
as would represented by the area of 45sP3. Obviously, it is preferable to use an image sensor 45s with 
square pixels and one4nmdred percent fill factor, as will be understood by those skiUed in the art Note that 
by choosing the mayitimm zoom depth to yield c^tured surface areas nine times the size of an originally 
captured background pixel such as represented by area 45sP3, the image distortion noise is minimized by 
the averaging of nine sanq)les, shown as 45zl, to create a conq)arison for a single sensor pixel such as 
45sPx. Furthermore, the practical mathematic equations are simplified because the sunulated or average 
pixel created from the nine sanq>les 45zl are exactiy centered on tiie target current image pixel 45sPx. 
Still referring to Fig. 28, after the initial mavimiiTn zoom setting of 3 Op / 3 Dt, decreasing zoom settings 
are shown to preferably change in both the horizontal and vertical directions by two increments of the 
minimiun pan and tilt angles. Dp and Dt respectively. In other words, if the maximum allowed filming 
zoom causes pixel 45sPx to image the area of 45sP3-9 that is effectively nine times the area constrained by 
the minimum pan and tilt angles. Dp and Dt respectively, than the next lowest zoom setting will cover the 
area of 45sP3-25 that is effectively twenty-five times the miniTnnm area of 45sP3, wi& a setting equivalent 
to 5Dp / 5Dt Again, note Ibat filming camera 40c's image sensor 45s is ideally always aligned to capture 
images at pan and tilt angles that ensure that each of its pixels, such as 45sPx, are centered around a single 
pixel, such as 45sPv in panoramic database 2r-pdb. In this way, depending upon the particular zoom 
setting, each single pixel of the cunenAy captxmd image 10c will always correspond to a whole multiple of 
total background pixels, sudi as the nine pixels in square 45zl or the twenty-five in square 45z2. As 
previously discussed with relation to Fig. 27, each individually stored pixel, such as 45sPv, in panoramic 
database 2r-pdb has ideally been limited or transformed in some way to minimize its distortioiL This may 
take the form of only storing the "innermost" sensor pixels, applying a transformation matrix to remove 
distortion or interpolation with neighboring cells. Regardless, when stored pixels such as those contained 
in database square 45zl or 45zl are themselves interpolated to form individual conqiarison pixels, the 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-84^ 

present inventors anticQ)ate applying a transformation matrix scaled to the zoom setting to effectively waip 
the resulting conq)arison to match the e:q>ected distortion in the cuiient image 10c. 
There are three major anticipated benefits to creating a panoramic background database 2r-pbd versus 
creating a database of individually stored background images, for instance 2im:3-1016 through 2r-€3-1019 
as depicted in Fig. 22. Bolh benelBts are related to the significant reduction in storage requirements for the 
panoramic versus individual image s^Toach. Hie first major benefit to the reduced storage requirements is 
Hast it becomes easier to build the storage media» i.e. disk or even memory, directly into the fihning 
assemblies such as 40c. The second major benefit is the greatly reduced transmission bandwidth 
requirements making it at least feasible to send panoramic database 2r-pdb via network coxmections to 
remote 1000 v^ereas it woiild be prohibitive to transmit individual fiames such as 2r-c3-1016 throug}i 2]> 
C3-1019. And finally, the overall storage requirements on remote system 1000 are also significantly 
reduced, especially when considering diat ideally several panoramic databases 2r-pdb are resident so as to 
siqppoTt six or more preferred filming assemblies 40c. 

Referring next to Fig. 29a, there is depicted the flow of data after it is originally captured by the overhead 
tracking assemblies 20cm which film and track game where it finally ends \sp being assembled into a 
broadcast by encoder 904. Specifically, all of the overhead film and tracking information begins as streams 
of current images 102a as output by overhead assemblies 20cm. As previously discussed, for each camera 
in the overhead assemblies 20cm, there are associated background image(s) 103 that are pre^stored and 
optionally updated to reflect continual background changes. After aj^lying background images 103 to the 
stream of current images 102a, a new dataset of subtracted & gradient images 102b is created. Rrom this 
dataset inoage analysis methods as previously discussed create symbolic dataset 102c as well as streanis of 
extracted blocks 102d. Information fiom the symbolic database 102c and extracted blocks 102d is &en 
used to create tracking database 101, which records die movement of all participants and game objects in 
game 2-g. Also available by the use of well known stereoscopic image analysis, extracted block taken of 
the same participants fiom different overhead cameras provides topological profiles 105. As tracking 
database 101 accumulates in real-time, a performance measurements & analysis database 701 is 
constmcted to create meaningful quantifications and qualifications of the game 2^. Based upon the 
tracked movement of participants and the game objects in database 101 as well as the determined 
performance measurements & analysis 701, a series of performance descriptors 702 is created in real-time 
to act as ixxpnt to a speech synthesis module in order to create an audio description of die ensuing game 2- 
g. 

StiU referring to Fig. 29a, streams of extracted blocks 102d are first sorted in the teiiq>oral domain based 
upon the known participants contained in any given image, the information of which coidjss &om the 
tracking database 101. As previously taugjbt, using either information Scorn a helmet sticker 9a or as read 
off a particq)ant*s jersey, the overhead system 20cm will first separate its extracted blocks lOe according 
to player 10 and / or game object, such as 3. In those cases where multiple participants form a contiguous 
shape and are therefore together in a single extracted block lOe, they are first arbitrarily separated based 
iq>on calculations of a best dividing line(s) or curves(s). Regardless, extracted blocks lOe with multq>le 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-85- 

players 10 can still fonn a single sub-stream for the given number of consecutive fiames in which they 
remain "joined" in contiguous pixel-space. The present inventors are referring to this process of sotting 
extracted blocks by their known contents as "localization." Once localized, extracted blocks lOe are then 
"normalized" i«4iereby they may be rotated to meet a predetermined orientation as previously taught Also 
as taught, for each extracted block lOe in streams 102d there are associated coma: coordinates that are 
used to indicate where the given block is located with respect to the current image 10c These comer 
coordinates are contained in extracted block database 102d and are carried into any derivative databases, 
the description of i^ch is forthcoming. Note that in die case that an originally extracted block lOe 
contains more than one player 10 and is therefore forcibly split as discussed; the resulting divided extracted 
blocks may not necessarily be rectangular. In this case, the ^ropriate mathematical description of their 
exterior bounding sh^, similar to two opposite comer coordaintes defining a rectangle, is stored in 
database 102d instead. 

Once the localized, normalized sub-stream database 102e has been formed, it is then optionally 
transformed into separated &ce regions 102f and non-&ce regions 102g. As previously taught, this process 
relies Mpon overhead tracking information from database 101 diat provides the location of the hehnet(s) 9 
within the detected playei(s) 10 shape. This location is ideaUy determined by first detecting the location of 
the helmet sticker 9a and then working with the color tone table 104a to "grow" outwards until the helmet 
color tone is con^leted enconq>assed. The color tone table 104a also provides information on which of the 
limited set of color tones are "Hmiform" versus "skin." This information can be used independently to 
search the interior of shapes seen fiom the o vediead when players 1 0 are not wearing helmets 9, such as in 
the sports of basketball or soccer. Regardless, once the &ce region lOciOha is enconq;>assed, it can be 
extracted into a separate stream 102f while its pkels are set to an ideal value, such as either null or that of 
the surrounding pixels in the remaining non-&ce region stream 102g. 

Still referring to Fig. 29a, as will be predated by those ^miliar with sporting activities, there are a 
limited number of basic positions, ox poses, that any individual player 10 may take during a contest For 
instance, they may be walking, running, bending over, junq>ing, etc. Each of these actions can tiiemselves 
be broken into a set of basic poses. When viewed from above, as opposed to the perspective view, these 
poses will be even fiirther limited. The present inventors anticipate creating a database of such standard 
poses 104b prior to any contest Ideally, each pose is for a single player in the same uniform that they will 
be using in the present contest With eadx pose there will be a set orientation and zoom that can be used to 
translate any current pose as cs^tured in database 102d and optionally subsequently translated into 
databases 102e, 102g and 102f. As is well known in the art, during the tenq>oral conq)ression of motion 
video, individual frames are con^ared to either or both their prior frame and the i^pcoming frame. It is 
understood ihaX there will be minimal movement between tiiese prior and next frames and the current 
frame. The present inventors anticipate the opportunity of additionally comparing the normalized current 
extracted blocks lOe found in sub-streams 102e (or any of its derivatives,) with the database of standard 
poses 104b . This will become especially beneficial when creating vAat is known as flie "T or independent 
frames in a typically compressed video streancL These "F frames, as will be understood by tfiose skilled in 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-86- 

the art, are purposefully unrelated to any other ftames so that they may serve as a **restartmg" point in the 
encoded video stream (such as MPEG2.) However, the feet that they are unrelated also means that they 
must carry the entire pertinent spatial information, or entropy, necessary to describe their contents. The 
present inventors teach that at least these T' fiames may be first conq}ared to their e3q>ected matches in the 
standard pose database 104b based upon the translated and normalized extracted block lOe in stream 102e, 
This con^arison will provide a •*best-fif' approximation to the current block lOe that can serve as a 
predictor fiame, thereby allowing for greater conq;)ression of the *T' frame as will be understood by tiiose 
skilled in the art Since the decoder will have reference to an exactly similar standard pose database 104b 
on the local system, reconstruction of the origmal streams T* frames can be acconq>lished via reference to 
the "pose number^ of the predictor in database 104b after which the "difference" frame may be ^plied 
yielding the original "F* frame. 

The present inventors frirther anticipate that it may be unrealistic to have established a standard pose 
database 104b prior to any given contest However, it is possible that as each new pose that is detected for 
a given player 10 during the herein discussed processing of streams 102e or 102g and 102f, can be added 
to a historical pose database 104cl. For instance, supposing that there was no standard pose database 104b 
available, then as game 2-g transpires, each player 10 will be transferring through a significant number of 
poses. Essentially, each captured firame resulting in an extracted block lOe which is then localized and 
normalized, can be first searched for in the historical pose database 104cl. If it is found, this pose can be 
conq>ared to the cmxent pose in block lOe. This comparison will yield a match percentage that if sufQcient 
will indicate that the historical pose will serve as a good predictor of the current pose. In this case it is 
used, and otherwise it is optionally added to the historical pose database 104cl with die idea that it may 
eventually prove useful. For each current pose 6om localized and normalized extracted block lOe - 
determined not to be within historical pose database 104cl, but marked to be added to database 104cl, an 
indication is encoded into the ensuing broadcast indicting that this same extracted block lOe once decoded 
should be added to the parallel historical pose database 104c2 on the remote viewing system (show in Fig. 
29d.) In tills way, both standard pose database 104b and historical pose database 104cl will have matrhing 
equivalents on the decoder system, tims reducing overall transmission bandwidth requirements via the use 
of references as will be understood by those skilled in the art 

Still referring to F^. 29a, and specifically to the creation of separated fece regions database 102f, the 
present inventors anticipate that there will be minimal participant fece regions lOcm-a within streams of 
extracted blocks 102d as captured from the overiiead tracking system 20cm. Furthermore, the main 
purpose for separating tiie fece regions lOcm-a is so that they may be encoded wilh a different technique 
such as available and well known in tiie art for image conq)ression than that chosen for tiie body regions 
A^ch may not require the same clarity. From the overhead view, tiiis additional clarity is not anticipated to 
be as inqportant as fiiom the perspective views to be reviewed in Fig. 29b. For these reasons, fece regions 
102f may not be separated from non-fece regions 102g. In this case, localized, normalized sub-streams 
102e will be processed similarly to those ways about to be reviewed for sq)araled non-fece regions 102g. 
Regardless, separated non-fece regions 102g are then optionally fiir&er separated into color underlay 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-87" 

images 102i and grayscale overlay images 102h, by use of the color tone table 104a, as previously tfliight. 
Ruthermote, as previously taught color underlay images 102i can either be represented as compressed 
bitmap images or converted to single-color regions defined by outlines such as would be similar to the use 
of b-splines in vector images. 

And finally, still referring to Fig. 29a, the present inventors teach that broadcast encoder 904 may 
optionally include various levels of segmented streams of current images 102a in its video stream 5K)4v 
such as: subtracted & gradient images 102b, symbohc database 102c, streams of extracted blocks 102d, 
localized, normalized sub-streams 102e, separated &ce regions 102f, sq>arated non-&ce regions 102g, 
color underlay images 1021, grayscale overlay images 102h and / or color tone regions 102j. The present 
inventors prefer creating a video stream 904v starting at least at the segmented level of Ihe localized, 
normalized sub-streams 102e. hi this case, for each sub-stream 102e, the encoded video stream 904v wiU 
ideally include localization data such as the sub-streams object identification and normalization data such 
as the extracted block location relative to the entire tracking sur&ce as well as the objects rotation and 
zoom (i.e. expansion jBsictor.) When optionally used, video stream 904v ideally includes codes referencing 
the predictive pose fix>m either the standard pose database 104b or historical pose database 104cl. All of 
this type of "image external" information provides exanq}les of data that is not currently either available or 
included in an encoded broadcast \^ch essentially works with the information intrinsically contained with 
the on^nal captured images such as 10c included in streams 102a. Encoder 904 also receives performance 
measurement Sc analysis database 701 to be encoded into its metrics stream 904m and performance 
descrq}tors 702 to be included into is audio stream 904a. 

Referring next to Fig. 29b, tiiere is depicted Ihe flow of data after it is originally c^turedby the 
perspective filming assembUes 40c, \dnch film die game fiom perspective view 2-pv, where it finally ends 
up being assembled into a broadcast by encoder 904 . Specifically, all of the perspective film begins as 
streams of current images 202a as output by perspective filming assemblies 40c. As previously discussed, 
the capturing of current image 10c for database 202a by asseniblies 40c is intentionally controlled to occur 
at a limited number of allowed pan and tilt angles as well as zoom dep&s. For each image c^tured and 
stored in database 202a, its associated pan, tilt and zoom settings are simultaneously stored in database 
202s. As previously taught, background panoramic database 203 can be pre-captured for each distinct 
filming assembly 40c, for each possible allowed pan, tilt and zoom setting. Also as previously taught, 
background database 203 can optionally include an individual c^tured image of the background at each of 
the allowed l?fT/Z settings whereby tiie individual images are stored separately rather than being blended 
into a panoramic. Exactly similar to the method taught for keeping the background images 2r &om the 
overhead assembUes 20c '^fi^eshed" with small evolving changes as contained within remainder image 
lOx, such as scratches on tihe ice sur&ce &om skates, the background database 203 is likewise evolved. As 
current images 10c are added to the stream 202a their associated P/T/Z Settings as stored m database 202s 
are used to recall &e overlapping pre-stored background image fiom database 203 . After applying 
background images 203 to the stream of current images 202a, a new dataset of subtracted & gradient 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-88- 

images 202b is created. From this dataset image analysis methods as previously discussed create streams 
of extracted blocks 202d. 

As previously taught, the extraction of the foreground from perspective view current images 10c is more 
problematic than the extraction fiom the overiiead views. By using the topological profiles 105 and 
tracking database 101 created by the overhead tracking system as reviewed in Fig. 29a» image analysis can 
separate foreground firom fixed as well as potentially moving background such as spectators 13. Aiding in 
the extraction process is the pre-detennined 3-D venue model database 901 tiiat at least helps define the 
fixed versus potentially moving background areas for each and every possible perspective view given the 
allowed P/T/Z settings. Also as taught, for each extracted block lOe in streams 202d there are associated 
comer coordinates that are used to indicate \^ere the given block is located with respect to the current 
image that is fiamed according to the current P/T/Z setting. These comer coordinates are contained in 
extracted block database 202d and are carried into any derivative databases, the description of which is 
forthcoming. 

Still rejfeiring to Fig. 29b, and exactly similar to the method steps reviewed in Fig. 29a, streams of 
extracted blocks 202d are first sorted in the temporal domain based upon the known participants contained 
in any given image, tihe information of which comes fifom the tracking database 101. As previously taught, 
using either information fiom a helmet sticker 9a or as read off a participant's jersey, information &om the 
overhead system 20cm will be used to first sq)arate the extracted blocks lOe according to player 10 and / 
or game object, such as 3. In those cases where multiple participants form a contiguous shape and are 
therefore together in a single extracted block lOe, fhey are first arbitrarily separated based iq)on 
calculations of a best dividing line(s) or cvirves(s). Regardless, extracted blocks lOe witii multiple players 
1 0 can still form a single sub-stream for the given number of consecutive frames in ^ch they remain 
"joined" in contiguous pixel-space. The preset inventors are referring to tiiis process of sorting extracted 
blocks by their known contents as "localization." Once localized, extracted blocks lOe are tiien 
"normalized" whereby they may be rotated and / or e^ipanded to meet a predetermined orientation or zoom 
setting as previously tauglit (The present inventors prefer to always expand extracted blocks to the greatest 
known, and controllable zoom setting but do not rule out die potential benefit of occasionally reducing 
extracted blocks in size during '^omoalization.") 

Once the localized, normalized sub-stream database 202e has been formed, it is then optionally 
transformed into sq)arated &ce regions 202f and non-&ce regions 202g. As previously taught, and using a 
related set of method steps as reviewed in Fig. 29a, this process relies upon oveihead tracking information 
fit)m database 101 that provides the location of the helmet(s) 9 within the detected player(s) 10 shape. This 
location is ideally determined by first detecting the location of the helmet sticker 9a and then working widi 
the color tone table 104a to "grow" outwards until the helmet color tone is completed enconq^assed. Once 
the outside perimeter dimensions of the helmet are determined, as will be understood by those skilled in 
the art, fbis information can be used to determine the upper topology of each player lO's helmet 9 that is 
determined to be within the view of any given perspective filming assembly 40c's current inoage 10c. 
Within this restricted pixel area, the player lO's face region can easily be identified, especially with 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-89- 

reference to color tone table 104a. Hence, the color tone table 104a provides information on which of the 
limited set of color tones are •'uniform" versus "skin." This infonnation can also be used independently to 
search the interior of shapes seen from the perspective view when players 10 are not wearing hehnets 9, 
such as in the sports of basketball or soccer. Regardless, once the &ce region lOcm-a is enconq)assed, it 
can be extracted into a separate stream 202f wUIq its pixels are set to an ideal value, such as either null or 
that of the surrounding pixels in the remaining non-£ace region stream 202g. 

Still referring to Fig, 29b, and exactly similar to the discussions of F^. 29a, there are a limited number of 
basic positions, or poses, that any individual player 10 may take during a contest For instance, they may 
be walking, running, bending over, jumping, etc. Each of these actions can themselves be broken into a set 
of basic poses. The present inventors anticipate creating a database of such standard poses 104b prior to 
any contest Ideally, each pose is for a single player in the same uniform diat they will be using in the 
present contest With each pose there will be a set orientation and zoom that can be used to translate any 
current pose as cs^tured in database 202d and optionally subsequently translated into databases 202e, 202g 
and 202f. As is well known in the art, during the tsnapoxdl compression of motion video, individual fiames 
are compared to either or bofli their prior frame and die iq>coming fi:ame: It is understood that there will be 
minim a l movement between these prior and next frames and the current franae . The present inventors 
anticipate the opportunity of additionally conq)aring the normalized current extracted blocks lOe found in 
sub-streams 202e (or any of its derivatives,) with the database of standard poses 104b. This will become 
especiaDy beneficial when creating what is known as the T' or independent frames in a typically 
compressed video stream. These "T frames, as will be understood by those skilled in the art, are 
purposefully unrelated to any other frames so that they may serve as a '^restarting'* point in the encoded 
video stream (such as MPEG2 .) However, tfie feet tiiiat they are uiuelated also means that they must cany 
the entire pertinent spatial information, or entropy, necessary to describe their contents. The present 
inventors teach that at least these "I" frames may be first compared to their expected matches in tiie 
standard pose database 104b based upon the translated and normalized extracted block lOe in stream 102e. 
This comparison will provide a "best-fit" ^proximation to the current block lOe that can serve as a 
predictor frame, thereby allowing for greater con^ression of the "F frame as will be understood by those 
skilled in the art Since the decoder will have reference to an exacdy similar standard pose database 104b 
on the local system, reconstruction of the original streams "F frames can be acconq)lished via reference to 
the **pose number" of the predictor in database 104b after which the "difference" frame may be applied 
yielding the original "F frame. 

Still referring to Fig. 29b and as previously stated with respect to Fig. 29a, the present inventors further 
anticipate that it may be unrealistic to have established a standard pose database 104b prior to any given 
contest However, it is possible that as each new pose that is detected for a given player 1 0 during the 
herein discussed processing of streams 202e or 202g and 202f, can be added to a historical pose database 
204cl. For mstance, supposing that there was no standard pose database 204b available, flien as game 2-g 
transpires, each player 10 will be transferring through a significant number of poses. Essentially, each 
captured frame resulting in an extracted block 1 Oe vfbich is then localized and normaUzed, can be first 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-90- 

searched for in the historical pose database 104cl. If it is found, this pose can be compaied to die cunent 
pose in block lOe. This comparison will yield a matoh percentage that if sufficient wiU indicate that the 
historical pose will serve as a good predictor of &e cunent pose. In this 

optionaUy added to the historical pose database 104cl with the idea that it may eventuafly prove useful. 
For each cunent pose fiom localized and normalized extracted block lOe determined not to be within 
historical pose database 104cl, but marked to be added to database 104cl, an indication is encoded into the 
ensuing broadcast indicting that this same extracted block 1 Oe once decoded should be added to the 
paraUel historical pose database 104c2 on the remote viewing system (show in Fig. 29d,) In this way, both 
standard pose database 104b and historical pose database 104cl will have matching equivalents on the 
decoder system, thus reducing overaD transmission bandwidth requirements via tiie use of references as 
will be understood by those skilled in the art 

Still refening to Fig. 29b, and specifically to flie creation of separated face regions database 202f, the 
present inventors anticipate that there may be circumstances where separating the fece portion of an 
extracted block lOe is not beneficial to overall conq)ression. For instance, when player's 10 take iq) a 
smaller portion of tiie current image 10c fiom perspective view 2-pv, tiie actual fece region itself may be 
minor in comparison to the otiier "entropy" within the nnage. As will be un 
art, human perception of image detail is less effective for smaUer fester moving objects. The present 
inventors anticipate that tiie tracking database 101 and 3-D venue model database 901, along witii pre- 
cahT)ration of all overfiead assemblies 20c and filming assemblies 40c to tiie venue model 901, will result 
in a system c^le of dynamicaUy determining die amount of potential fece area per player 10 in each 
perspective fihn current image 1 Oc This dynamic determination will for instance cause zoomed in shots of 
slower moving players to be separated into fece regions 202f and non-fece regions 202g. Conversely, 
zoomed out shots of fester moving players wiU not be separated Furtiiermore, tiie main purpose for 
separating tiie fece regions lOcm-a is so tfiat tiiey may be encoded witii a different technique such as 
available and well known in tiie art for image conq)ression tiian tiiat chosen for tiie body regions which 
may not require flie same clarity. If sqjarated, tiiey will be seamlessly reconstructed during tiie decode 
phase as summarized in Fig. 29d. Otiierwise, tiie data in separated non-fece region 202g will be equivalent 
to localized, normalized sub-streams 202e. Regardless, separated non-fece regions 202g are tiien optionaUy 
fiirther separated into color underlay images 202! and grayscale overlay images 202h, by use of tiie color 
tone table 204a, as previously taugjit Furtiiermore, as previously taught color underlay images 2021 can 
eitiier be represented as conq)ressed bitm^ images or converted to single-color regions defined by outlines 
such as would be similar to tiie use of b-splines in vector images. 

And finally, still referring to Fig, 29b, tiie present inventors teach that broadcast encoder 904 may 
optionally include various levels of segmented streams of current images 202a in its video stream 904v 
such as: subtracted & gradient images 202b, streams of extracted blocks 202d, localized, normalized sul>- 
streams 202e, separated fece regions 202f, s^arated non-fece regions 202g, color underiay images 2021, 
grayscale overiay images 202h and / or color tone regions 202j. The present inventors prefer creating a 
video stream 904v starting at least at tiie segmented level of tiie localized, normalized sub^streams 202e. In 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-91- 

tiiis case, for each sub-stream 202e, the encoded video stream 904v will ideally include localization data 
such as the sub-streams object identification and nonnalization data such as the extracted block location 
relative to the entire tracking sur&ce as well as the objects rotation and zoom (i.e. expansion &ctor.) 
Associated with this will be the P/T/Z settings 202s for each extracted / translated foreground image. When 
optionally used, video stream 904v ideally includes codes referencing the predictive pose from either the 
standard pose database 104b or historical pose database 104cl. All of this type of "image external'" 
information provides exan:q>les of data that is not currentfy either available or included in an encoded 
broadcast i^ch essentially works with the information intrinsically contained with the original captured 
im^es such as 10c included in streams 102a. Encoder 904 also receives ambient audio recordings 402a as 
well as ttieir translation into volume and tonal maps 402b, as previously discussed 
Referring next to Fig. 29c, there is depicted five distinct combinations of video stream data 904v, metrics 
stream data 904 m and audio stream data 904 a that can optionally form tiie transmitted broadcast created by 
encoder 904. These combinations are representative and not intended by the present inventors to be 
exclusive. Other combinations can be formed based upon die data sets described specifically in Fig.'s 29a 
and 29b and in general described herein and within all prior continued applications. Exangples of other 
combinations not depicted wi&in diis Fig. 29c will be discussed after those shown are first described. The 
combinatioits shown have be classified as profile A 904pA, profile B 904pB, profile CI 904pCl, profile 
C2 904pC2 and profile C3 904pC3. Profile A 904pA is rq)resentative of the information contained in a 
traditional broadcast and is based upon video stream 904v comprising streams of current images such as 
102a and 202a as well as ambient audio recordings 402a. (Note that the present inventors are saying that 
the format of the streams of current im^es, such as 102a and 202a, is similar to that provided to a 
traditional encoder for conqpression and transmission. The present inventors are not in^lying that tiie 
streams of current images fix>m the overhead cameras 1 02a are themselves in any way traditional, or taught 
by the state of the art, and in &ct must first be "'stitched together" fix>m a multiplicity of overhead images 
that in itself is considered a teaching of the present application.) 

Profile B 904pB represents tiie first level of unique content as created by the apparatus and methods taught 
in the present application. Specifically, this profile 904pB con^rises associated VIXfL Settings 202s 
required for decoding streams of extracted blocks 102d and 202d as previously taught Profile B 904pB 
fiu^er coiiq>rises new gradient images 102b and 202b that to the end-viewer appear to be moving *'line- 
art" This "line-art" representation of the game activities can fiirdier be colorized to match the different 
teams especially by encoding color tone codes within the outlined region interiors are previously 
discussed. (This colorized version is essentially the same information encoded in the color tone regions 
1 02J, where the grayscale information has been removed and the images are repres^ited as line or curve 
bounded regions containing a single detected color tone.) The potential conq>ression advantages of this 
representation are apparent to those skilled in the art It is anticipated that a particular broadcast could 
contain traditional video perspective views of the game action along with a colorized "line-art" view of 
game fix)m the overhead based \spm. gradient images 102b . It is also anticipated that during times of high 
netwo± traffic or less stable communications, the encoder 904 may receive feedback fi:om the decoder 950 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-92- 

that could automatically "dowugrade" from perspective views generated from streams of extracted blocks 
202d to colorized 'Tine-art" based upon gradient images 202b. Or, for slower speed connections, the 
present inventors anticipate swaply transmitting the gradient images 102b or 202b, or the color tone 
regions 102j as will be discussed with profile C3 904pC3, rather than sending the streams of extracted 
blocks 102d or 202d. 

Still referring to Fig. 29c and profile B 904pB, the video stream optionally frirther con^rises symbolic 
database 102c, based upon information determined by the overhead trackmg system 100. As previously 
discussed, the anticipated symbols in database 102c include a inner oval for the location of the hehnet 
sticker 9a, and first outer oval representing the surrounding limits of the player's hebi^t 9, a second outer 
oval representing the ^^proximately shape of their player's body lOsB as well as a vector representing the 
player's associated sticker lOsS. The game object, such as puck 3, will also be represmted as some form of 
an oval, typically a circle for the game of ice hockey. The present inventors anticipate that this symbohc 
representation will provide valuable information and may further be used to enjoy a depiction of the game 
via very low bandwidth coimections that otherwise carmot support the live transmission of ei&er tiie 
extracted blocks 102d or 202d, or the gradient images 102b or 202b. Further anticipated is the abihty to 
colorize these symbols to help define the home and away teams and to identify each symbol by player 
luunber and / or name based i^n tracking information embedded in the symbolic database 102d. 
Also present in profile B 904pB is the performance measurement & analysis database 701 containing 
important summations of the underlying tracking database 101. These summations as previously discussed 
are anticipated to include the detection of beginning and ending of certain events. For the sport of ice 
hockey, these events mig^ include: 

• a player lO's entrance into or exit 6om of&cial game play, 

• a scoring atten^t determined when a defensive player 1 0 causes the puck 3 to enter a traj ectory 
towards the goal, 

• a score where the puck 3 has entered the area of the goal and/ or the game inter&ce system has 
indicated a stoppage of play due to a scored goal, and 

• a power play / short handed situation where one team has at least one player 10 in game play less 
than the other team. 

The proceeding exan^les are meant to be representative and will be the focus of a separate application by 
the present inventors. The exan:q)les are not meant to be limitations of tiie extent of the performance 
measurement & analysis database 701 tiiat is considered by the present inventors to include significant 
perfomiance and game status information. Many other possible interpretations and summations of tiie 
tracking database 101 are possible including player passing, hits, gap measurements, puck possession, 
team speed, etc. What is important is that the present inventors teach apparatus and mediods capable of 
determining and broadcasting this information 701 m combination witii cross-indexed video such as 
streams 102d or 202d or die derivatives of these streams such as gradients 102b or 202b or syrribolic 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-93- 

database 102c. And finally, within proffle B 904pB there are also ambient audio recoidmgs 402a as 
incorporated in the traditional profile A 904vA. 

Referring stiU to Fig. 29c, and now specificaUy to profile CI 904pCl, it is shown to differ fiom profile B 
904pB in that streams of extracted blocks 102d and 202d are replaced by locahzed, nonnalized sub- 
streams 102e and 202e. As Avas previously taugjht and will be understood by fliose skilled in the art, by 
sorting the extracted blocks into sub-groups based i^n player and object identity, the likelihood of 
performing successfiil "block matching^ between images in the tenqwral plane is greatly increased. This 
increase in likelihood will positively effect both conq)utational requirements and conq)ression levels. More 
specifically, traditional conq>ression algorithms attenqpt of isolate moving foreground objects with a 
potentially moving background (due to a moving camera.) The process of finding foreground objects 
requires a '*block matching" and "motion estimation" procedure between successive video fiames as will 
be well understood by those skilled in the art The present invention greatly reduces this coiiq)utational 
effort by first isolating the moving foreground objects based vpon information collected fiorn the overhead 
tracking system that is directly relatable to the current image fiom each calibrated perspective view 
camera. Essentially, the con^xression algorithms no longer have to seareh for moving objects between 
successive fiames since these objects are identifiable in real-time based upon apparatus and mefliods tau^t 
herein. Each moving foreground object transverses a contiguous path in tiie "real domain" of the tracking 
area that typically turns into a variant path across the succession of video fiames. By first extracting, then 
dividing and finally sorting the moving foreground objects such as players 10 into sub-streams, it is 
possible to greatiy Hmit the parent movement in the temporal dimension as perceived by tiie traditional 
"motion estimation" algorithms. Hence, as they search for movement fiom fi:ame to fi^me, they are 
progressively more likely to find less movement as they process localized sub-streams 1026 and 202e, 
versus streams of extracted blocks lG2d and 2a2d, versus the traditional streams of current images 102a 
and 202a. Furthermore, by first normalizing the localized sub-streams, so that tiie same player fiom fiame 
to fiiame does not significantiy change in either size or, as much as possible orientation, then tiie "block 
matching" algoritiuns are fiirther aided. As will be appreciated by fliose skilled in the art, tiie net result of 
tiiese teachings is tiie effect of taking flie "motion" out of what is normally "hi^-motion" video. This net 
reduction in "motion" greatiy imapeases conqnession opportunities. For instance, higher conq)ression 
metiiods typically reserved for use witii "minimal-motion" video conferencing (such as the XYZ 
technique) may now be usable witfi "higji-motion" sports video. 

Still referring to profile CI 904pCl in F^. 29c, tiie otiier difference versus tiie prior profile is flie inclusion 
of performance descrq)tors 702 and volume and tonal m^s 402b in the audio stream 904a. Performance 
descriptors 702 are derived primarily fiom performance measurement & analysis database 701 but may 
also be influenced by information in 3-D volume model database 901 and tracking database 101. 
Descriptors 702, as previously taught are anticipated to be a series of encoded tokens representing a 
description of the ongoing activities in tiie game matched to the transmitted video stream 904v, metrics 
stream 904m and ambient audio recordings 402a in audio stream 904a. For tiie sport of ice hockey, such 
descriptions may iaclude: 



CA 02563478 2006-10-16 

WO 2005/099423 PCT/US2005/013132 

-94^ 

• the announcement of a player 1 0 entering the game, v/hecehy such an announcement may be made 
as a decision local on the remotes system at the time of decoding, for instance in the case the local 
viewer is pre-known to be related to or interested in the player 10, or the player 10 themselves, 

• QxG announcement of an attempted shot by a particular player 10 and its result such as blocked or 
goal, 

• the announcement of a team* s power play with references back to results Gcom previous power 
play's in Ihe present game, or 

• the announcement of ofScial scoring or penalty calls as gathered &om the game interface system 
600. 

The proceeding examples are meant to be representative and will be the focus of a separate ^plication by 
the present inventors. The examples are not meant to be limitations of tiie extent of the performance 
descrq)tors 702 that is considered by the present inventors to include significant performance description 
information Many other possible translations of the performance measurement & analysis database 701 
are possible including player pasising, hits, gap measinements, puck possession, team speed, etc. 
Fur&ermore, many oibet possible translations of ibe tracking database 101, e^cially with respect to the 
3-D venue model database 901 are also possible including descriptions of the location of the puck 3, a 
specific player 10 or &e general action being in the ''defensive zone,*' "neutral zone," or "attack zone." 
What is inqiortant is that the present inventors teach ^Tparatus and methods capable of determining and 
broadcasting these descr^tors 702 in combination with cross-indexed video stream 904v, metrics stream 
904m and other information in audio stream 904a. As has been discussed and will be reviewed in 
association with Fig. 29d, these tokens may be used to automatically direct text-to-^ech synthesis 
software modules widi the net result of seating an automated game commentary audio track. 
And finally, volume & tonal m^ 402b represent encoded sanq)lings of the ambient audio recordings 
402a designed to create a significantly n^iore conq>res5ed representation of the audio environment of tiie 
ongoing contest, as will be understood by those skilled in tiie art The present inventors anticipate that the 
exact nature of tiie sounds present at a sporting contest are not as important, and are in &ct not as noticed, 
as are the general nature of Ae ambient sounds. Hence, the fact that the crown noise is increasing or 
decreasing in volume at any given time carries a significant portion of the real audio "information" 
perceived by the end viewer and is much sinq)ler to encode than an actual sound recording. Hie present 
inventors refer to this as a "tonal map" Aat is at it siniplest a continuous stream of decibel levels and at its 
most conplex a set of decibel levels per predetermined pitches, therefore referred to as "tonal m^." 
These maps may then be used during the decode phase to drive the synthetic recreation of the original 
game audio track. The present inventors further anticipate using information from the performance 
measurement & analysis database 701 to further augment the synthesized audio reproduction, for instance 
by the addition of a "wtoiing, police siren-like goal-scored" sound often found at a hockey game. 
Regardless, what is in^rtant is that tiie present inventors antid|)ate reducing tiie bandwidth requirements 
of the audio stream 904a portion of the encoded broadcast to minimally include tokens or other 



CA 02563478 2006-10-16 
WO 2005/099423 PCT/US2005/013132 



-95- 

lepresentadons that are not in audio fonn but which can be translated into synthesized audible signals in 
order to add a realistic audio representation of the game's activity. 

Referring still to Fig, 29c, and now specifically to profile C2 904pC2, it is shown to differ from profile CI 
904pCl in that localized, normalized sub-streams 102e and 202e are now further segmented into separated 
non-fece regions 102g and 202g as well as separated fece regions 102f and 202f. As was previously taught, 
such separation is possible based upon the ^jparatus and methods taught herein and specifically allowing 
for the efficient real-time location of the exact pixel area within a given current image 1 Oc and its extracted 
blocks lOe, where the face region is Gxpected to be found. Furthermore, use of the color tone table is an 
inqwrtant method for isolating skin versus unifomi, which is even more relevant after moving backgrounds 
of spectators have been removed, again based upon the teachings of the present application. As will be 
understood by tiiose skilled in the art, different cQnq)ression methods may be ^lied to non-fece regions 
102g and 202g verses face regions 102f and 202f based upon the desired clarity, Furthennore, as 
previously discussed, the decision to make this fiirthei: segmentation can be dynamic. For instance, dining 
close 1^ filming of one or more players, it is anticq)ated to be beneficial to separate tiie fece region for a 
'•better" encoding method that retains fiiriher detail. Since the uniform is not ejected to be as "noticed" by 
the viewer, the clarity of its encoding method is less significant However, since the uniform enconq>as5es 
a greater pixel area that the fece region, using a more compressed method offers significant overall 
conq>rBssion advantages, as will be understood by those skilled in the art 

Referring still to Fig. 29c, and now specifically to profile C3 904pC3, it is shown to differ fix)m profile C2 
904pC2 in that separated non-&ce regions 102g and 202g have themselves been segmented and 
transmitted as color underlay images 102i and 202i and grayscale overlay images 102h and 202h. As 
previously discussed, using pre^own color tone table 104a, tiie present invention teaches a method for 
first subtracting fix)m each pixel identified to be apart of the foreground image the nearest associated color 
tone. The resulting difference value is too be assigned to the associated pixel in flie grayscale overlay 
images 1 02h and 202h . As will be understood by those skilled in the art, wbst is left in the color underlay 
images are areas of contiguous pixels conprising the same nearest matching, or subtracted, color tone. As 
will also be understood, this process has removed tiie higjier fi^quency pixel color / luminescence 
variations fiom tiie color underlay images 102i and 202i and placed them in the overlay images 102h and 
202h. This inherently makes the uruierlay images 1021 and 202i more conq)ressible using traditional 
methods. Hie present inventor prefer an ^yproach that first converts the RGB three byte encoding of each 
foreground pixel to its YUV equivalent as will be understood by those skiDed in the art This 
transformation in color representation methods results in a separation of the hue and saturation, referred to 
as UV and the luminescence, referred to as Y. hi practice, this conversion should always provide a UV 
value very near one of the pre-known color tones in table 104a. Once the nearest matching color tone is 
identified &om the table 104a, it is used to reset the UV value of the foreground pixel; hence locking it in 
to the color that it is determined to be most closely matching. (Nfote that the pre-known color tones in table 
104a are preferably stored in the UV format for easier comparisoiL) The already converted luminescence 
value than becomes the pixel value for the grayscale overlay images 102h and 202h. Again, as will be 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-96- 

undeistood by those skilled in the art, flie process of removing the luminescence is a well known approach 
in image conqjression. What is further taugjit is tibe resetting of the UV values to their nearest match in the 
color tone table 104a with the understandmg that these are the only possible colore on the detected 
foreground objects. This **flattening'' process removes minor variations due to different foims of noise and 
creates a much more compressible color underlay image 102i and 202i. 

The present inventors further prefer limiting the Y or luminescence values to 64 variations as opposed to 
256 possible encodings in the traditional 8 bit format One reason for this is that studies have shown that 
the human eye is capable of detecting only about 100 distinct grayscales (versus 100,000 to 200,000 hue / 
saturation combinations.) Furtfiermore, for smaller fester moving objects the eye's ability to distinguish 
distinct values is even further limited. Therefore, for the sake of higjier con^ression values, flie present 
inventors prefer a 6 bit, rather than 8 bit, encoding of luminescence. This six bit encoding will effectively 
represent 1 to 64 possible brigjitness variations on top of each color tone in table 104a. 
As will be understood by those skilled in the art, ti^tional methods of encoding Y and UV values have 
typically adopted a an ^jproach that fevors receding the Y value for every pixel with 8 bits or 256 
variations, while both the U and V values are recorded for every forth pixel with 8 bits or 25 6 variations. 
Thus, every four-square block ofpixelsrequires4* 8 = 32 bits to encode luminescence and 1 ♦8 = 8bits 
to encode hue and 1 * 8 = 8 bits to encode saturation, for a total of 48 bits. This approach is satisfectory 
because human perception is more sensitive to variations in luminescence versus hue and saturation 
(color.) Note that this provided a 50% savings in bit rate over the RGB encoding which requires 8 bits for 
each color, red (R), blue (B) and green (G) and therefore a total of 4 * 3 * 8 = 96 bits. The preset 
inventors prefer encoding the Y value wifli 6 bits (i.e. 64 grayscale variations) over %'s of the pixels, 
thereforeyielding3 *6= 18 bits. Furthermore, the U and V values are essentiaOy encoded into the color 
tone table 104a. Thus, the present inventors prefer encoding the color tone for every forth pixel using 6 bits 
(i.e. 64 possible color tones,) therefore yielding 1*6 = 6 bits. This combination provides a total of 24 bits 
which is a 50% reduction again over traditional concpressiorL Note that the approach adopted by the 
present teachings allows for the fece regions 102f and 202f to be separated with the idea that tiie traditional 
48 bit encoding could be used if necessary to provide greater clarity, at least under select circumstances 
such as close up shots of slow moving or stationary players ^ere any loss of detail would be more 
evident It should not be construed that this preferred encoding method is strictiy related to the video 
stream 904v in profile c3 904pC3. The present inventors anticipate this encoding method will have 
benefits on each and every level firom profile B 904pB through to that presently discussed. Furthermore, 
these profiles are meant to be exenq)lary and may themselves become variations of each other. For 
instance, it is entirely possible and of anticipated benefit to employ the color tone table 104a during the 
creation of the streams of extracted blocks 102d and 202d. In this case, encoding methods such as the 24 
bit Y / Color Tone method just described may be m^lemented. What is iirqrartant is tiiat the individual 
opportunities for broadcast aicoding that arise fiom the apparatus and methods of the present application 
may be optimally constructed into unique configurations witiiout departing from the teachings herein as 
will be understood by those skilled in die art 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



'97- 

And finally, still with lespect to Fig. 29c and profile C3 904pC3, it is possible to alternately encode and 
transmit color underlay imag^ 102i and 202i as color tone regions 102J and 202]. As will be understood 
by fliose skilled in the art, color underlay images 102i and 202i have essentially been "flattened," thereby 
creating whole pixel areas or regions of the foreground object containing a single color tone. As will also 
be appreciated, as these regions grow in size, it may become more beneficial to sinxply encode the regions 
border or outline along with a code indicating the interior color tone rather than attenq)ting to encode every 
'*macro-block" within each region. The present inventors anticipate that this decision between the 
traditional "raster" ^sproach that encodes pixels versus tiie 'Vector^ approach that encodes shapQ outlines 
with interior region color descriptions can be made dynamicaUy during the broadcast encoding. For 
instance, one particular player 10-1 may ^spear through a given sequence of image fiames at a much closer 
distance than another player 10^2. Player 10-1 dierefore is taking up more pixels relative to the entire 
current fiame and is also more "visible" to the end viewer. In this case, after localization that breaks this 
player 10-1 *s foreground information into its own localized and normalized sul>perspective view stream 
202e, the encoder 904 may preferably choose to create separated face region 202f fiom non-&ce region 
202g so that player 10-1's face may be encoded with more detail using traditional 48 bit YUV encoding. 
Conversely, player 10-2, who ^ipears fiulher away in the present image, is also first localized and 
normalized into stream 202e. After this, encoder 904 may preferably choose to create skip straight to color 
tone regions 202j with grayscale overlay images 202h using the aforementioned 24 bit Y / Color Tone 
encoding for the regions 202J, 

The present inventors wish to emphasize the in:q)ortance of the various teachings of new apparatus and 
methods within die present invention that provide critical information necessary to drive the 
aforementioned and anticipated dynamic broadcast encoding decisions. For instance, die information 
collected and created by die o vediead tracking system 1 00 provides critical data that is necessary for 
encoder 904 to determine dynamic option parameters. Such exanq>les of critical data being: 

• what player 10-1 is cuirendy being viewed in the extracted foreground block lOe?; 

• what are the color tones that are expected to be found in this player 1 0-1?; 

• are there any other players such as 10-2 or 10-3 diat are calculated to be obstructing view of player 
10-1?; 

• if so, what color tones lOct may be expected on obstructing player's 10-2 or 10^3?: 

• where is the helmet of player 10-1 in the current extracted block and therefore also, where is 
player lO's fiice region and how many pixels does it take up?: 

• what is the relative speed of player 10-1 taking into account the known P/F/Z movements of the 
filming camera assembly 40c capturing extracted block lOe?; 

• what is die pixel area taken by player 10-1?, and 

• how is all of this information anticipated to change in die direcdy ensuing image firames based 
upon known trajectory vectors of players 10-1, 10-2 and 10-3, etc.? 



wo 2005/099423 



CA 02563478 2006-10-16. 



PCT/US2005/013132 



-98- 

Hiis list as provided is meant to summarize flie effective value of the combination of the use of a tracking 
sj^tem with that of a filming system. The present inventors anticipate other critical information, some as 
previously taught and implied herein, and some as will be obvious to those skilled in tiie art that have not 
been e^ressly discussed. What is inqrortant is benefits to the encoding process based upon a controlled 
filming system that can be gained via the integration with an object tracking system. 
Referring next to F^. 29d, there is shown the four '*non-traditional" profiles B tbrou^ C3, 904pB through 
9MpC3 respectively, as first depicted in Fig. 29c being presented by broadcast encoder 904 to decoder 
950 that ultimately creates a viewable broadcast 1 000. Wifti re^>ect to the present Fig. 29d, the 
interpretation of the naost segmented profile, namely C3 904pC3, will be discussed ni detail. As will be 
understood by those skilled in the art, similar concepts are likewise applicable to the remaining less 
segmented profiles B 904pB througji C2 904pC2. First, it is understood that any remote system receiving 
the broadcast fiom encoder 904 should already have access to the following pre-established databases: 

• the 3-D venue model database 901 describing the ^ility where the broadcasted game is being 
played; 

• the background panoramic database 203 for all perspective filming assernblies 40c contributing to 
the received broadcast as well as an overall background for the overhead views captured by 
assemblies 20c; 

• the3-D ad nK>dei database 902 containing at least virtual advertisements in the fonn of floatir^ 
and fixed billboards registered to the 3-D venue model database 901; 

• the color tone table 104a containing the UV ^ue and saturation) equivalent values for between 
preferably 1 to 64 distinct imiform and skin color tones e?q)ected to be foimd on both honcie^^ 
away players; 

• the standard pose database 104b of pre-captured images in '^extracted block** form that can be 
used as predictors at least for the T (independent) frames associated with a given video stream; 

• the description translation rules 703a that define how performance descriptors 702 should be 
converted into text and then synthesized into speech, 

• the audio m^ translation rules 403a that define how the volume and tonal m^s 402b should be 
converted into synthesized crowd noise, and 

• the viewer profile & preferences 951 describing inq)ortant marketing information describing the 
viewer(s) as well as there relationship to the game in addition to holding information concerning 
the actual configuration of the viewable broadcast 1000 that they would prefer. 

The present inventors anticipate that these aforementioned databases will be made available via a data 
storage medium such as CD ROM or DVD to the user on the remote system. These files are then copied 
onto tile remote system in such a way that they are available to decoder 950. It is fiirther anticipated that 
either some or all of the files could either be downloaded or updated with changes or additions via the 
Mtemet, preferably using a higji speed connection. What is important is the teachings of flie present 



wo 2005/099423 



CA 02563478 2006-10-16 



PCTAJS2005/013132 



-99- 

invention that show how the pie-establishment of this information on the remote decoding system can be 
used to ultimately reduce the required bandwidth of ttie broadcast created by encoder 904. 
Still refening to F^. 29d, in addition to the aforementioned pre-estabhshed databases, the present 
invention also teaches the use of a set of accumulated databases as follows: 

• the historical pose database 104c2 of saved poses from the recreated broadcast stream being 
received from encoder 904 that may be used in a similar fashion to any standard poses in database 
104b; 

• the historical perfoimance database 701a that is accumulated from the transmitted performance 
measurement & analysis database 701 and may include the current game as well as all other 
viewed games, thereby providing a background of measurements into vMch the current game may 
be contrasted, and 

• the historical descriptor translations 703b that are accumulated from the actual translations of the 
performance descriptors 702 as they are operated iqx>n using rules 703a and may include the 
current game as well as all other viewed games, thereby providing a background of phraseology 
that has been used previously into ^lich the current games translations may be influenced. 

As video stream 904v, metrics stream 904m and audio stream 904a are received from broadcast encoder 
904 by decoder 950, tiie aforementioned pre-established and accumulated historical databases cooperate to 
translate the encoded information into broadcast 1000 under viewer directives as stored in profile & 
preferences database 951. Specifically, with reference to the decoding of profile C3 904pC3, decoder 90S 
may receive color underlay images 102i and 202i that are translated via color tone table 104a into their 
^propriate UV (hue and saturation) values per pixel. As previously stated, the images themselves 
preferably include a single 6 bit code for every for bit block ofpixels.£ach 6 bit code represents 1 of 64 
possible color tones lOct that are then translated into an equivalent 8 bit U (hue) and 8 bit V (saturation) 
for use in the final display of images. Note that the equivalent 8 bit U and 8 bit V values do in fact 
represent one "color^ or hue / saturation out of 256 * 256 = 65,536 possible choices. Hence, the video card 
on the end user's PC will use the resulting UV code to choose from amongst 65,536 displayable colors. 
The present invention is singly taking advantage of the &ct that it is pre-known up front that there are 
never more Aan a total of 64 of these possible 65,536 being used on any home or away team uniform or 
equipment or in any player's skin tone. The present inventors anticipate that should there be circumstances 
wdiereby there are more than 64 possible colors that may be present on a foreground object, some of these 
colors can be "dropped" and therefore included with the next closest color, especially since they may not 
^pear on large spaces or very often and for all intensive purposes will not be noticed by die viewing 
audience. 

Still referring to Fig, 29d, it is possible that the encoder 904 will alternatively have chosen to transmit 
color tone regions 102j and 202j versus color underlay images 102i and 2021. As previously stated, this is 
primarily a difference between vectors versus raster based encoding, respectively. In this case, regions 
102j and 202j are first traiislated into equivalent bitmap representations as wiU be understo^ 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



- 100 - 

skilled in the art. Hiese bitmap rq)resentations will then also be assigned UV values via the color tone 
table 104a as previously stated for the color underlay images 102i and 2021 It is possible that either color 
underlay images 102i and 2021 or color tone regions 102j and 202j will be referenced to, or "predicted 
from," a standard pose in database 104b or a historical pose in database 104c2. As will be understood by 
those skilled in the art, these standard or historical poses would then become the underlying pixel image to 
which the transmitted "difference" image, either in the form of color underlay images 102i and 2021 or 
color tone regions 102j and 202j, would then be "added" in order to return to an original current player 10 
pose. The endresultofaUofthese possible decodingpaths is the recreation of foreground over^^^ 102dR 
and 202clR. Note fliat once a foreground overlay 1 02dR and 202dR has been recreated, a directive may 
also be embedded in the transmitted data indicating the this particular pose should be stored in the 
historical pose database 104c2 for possible future reference. The present inventors anticipate flagging such 
poses on the encoding side due to information that indicates that, for instance, a player 10 is being viewed 
in isolation, they are relatively close-i^ in view, and that the orientation of their pose is significantly 
different from any other such previously saved poses in the uniform colors they are currendy wearing. 
Also potentially adding to recreated foreground overlays 102dR and 202dR are translated separated fece 
regions 102f and 202f. As previously stated, separated face regions 102f and 202f are optionally created by 
encoder 904 particularly under those circumstances when greater image clarity is desired as opposed to 
separated non-fece regions 102g and 202g. There translation is exactiy similar to that of color underlay 
images 102i and 202i in that the color tone table 104a will be used translate color tones lOct into UV 
values and standard pose database 104b or historical pose database 104c2 will optionally be used as 
"predictors." After the translation of either color underlay images 1021 and 202i or color tone regions 102j 
and 202j, and then optionally separated fece regions 102f and 202f; grayscale overlay images 102h and 
202h are themselves translated and added onto the current recreated foreground overlays 102dR and 
202dR. Specifically, grayscale overlay images 102h and 202h are decoded in a traditional feshion as will 
be understood by tiiose skilled in the art. This additional luminescence information will be used to augment 
the hue and saturation information already determined for the recreated foreground overlays 102€lR and 
202dR. 

Still referring to Fig. 29d, after overlays 102dR and 202dR have been recreated, they are placed on top of 
recreated background underiays 203R, forming a single images in the streams of current images 102aR 
and 202aR, as will be understood by tfiose skilled in the art. Background underlays 203R are recreated to 
match the transmitted associated P/T/Z settings 202s. Essentially, as was previously taught, for each 
current image 10c taken from a filming ass^nbly 40c, the assemblies perspective, or view was fixed at a 
pre-determined orientation as expressed in pan, tilt and zoom settings. While tiie encoding process then 
removes and eliminates the backgroimd, the decoding process must first restore either an equivalent 
**natural" or "animated" background. As was previously taught, in order to recreate an equivalent "natural" 
background, the associated P/T/Z settings can be used to extract direcfly Scorn the background panoramic 
database 203 proximately the same pixels that were originaDy removed from unage 10c. When used as 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-101- 

an underiay 203R, the resulting current image in streams 102aR and 202aR will look "realistic" and for all 
intensive purposes undistinguishable to die viewer. 

The present inventors also anticipate that it will be highly beneficial to be able to insert realistic looking 
advertisements in between the recreated background 203R and the merged in foreground 102dR and 
202dR making it seem as if the ad, or billboard, was actually in the stadiima all along. As previously 
discussed, these advertisements can be drawn from a larger 3-D ad model database 902 on the decode side 
of the transmission, thereby not only saving in required bandwidth, but perhaps more iit^rtantly, allowing 
for customized insertion based iq>on the pre-known viewer profile & preferences 951. 
Still referring to Fig. 29d, under certain circumstances such as in response to the viewer profile & 
preferences 951 , an animated background will be used rather than the natural one just described. In this 
case, associated P/T/Z settings 202s arc intetpreted in hght of the 3-D venue model database 901, thereby 
determining exactly ^^ch part of the stadium is within the current view. As this is know, the 3-D model 
901 may contain information, such as background colors and texture, necessary to drive an animation 
program as will be understood by those skilled in tiie art Similar to the natural background, advertisements 
fipom database 902 can be overlaid onto die animated background forming tiie background underlays 203R. 
Regardless, once streams of current images are available, the video portion(s) of broadcast 1000 can be 
controlled via the profile & preferences database 951 that is anticipated to be interactive witii the viewer. 
The present inventors fiirtiier anticq)ate that as the viewer indicates changes in preference fix>m a certain 
view to a different view or views, this information can be feed back to the encoder 904, In this way, 
encoder 904 does not have to transmit all possible streams finom either the overhead assemblies 20c or 
perspective assemblies 40c. Furthermore, it is possible that in response to the viewer profile & preferences 
951 only the gradient images 1 02b and 202b are transmitted, and / or only the symbolic data 102c, eto. 
Specifically, with respect to gradient images 102b and 202b, when present in video stream 904v they can 
be translated using traditional tedmiques as will be understood by those skilled in the art based i^>on either 
raster or vector encoding. Furthermore, using color tone table 104a, they can be colorized to better help the 
viewer distinguish teams and players. If symbolic database 102c is present in the video stream 904v, it can 
be overlaid onto a graphic background depicting the playing surj&ce and colorized using color tone table 
104a. Furthermore, the present inventors anticipate overlaying usefid graphic information onto any of the 
created views being displayed within broadcast 1000 based upon either performance measurement & 
analysis database 701 or its historical database 701a. Such graphic overlays, as previously taug^ may 
include at least a floating symbol providing a player lO's number or name, or it may show a continually 
evolving streak representing the path of a player 10 or the puck 3. These overlays may also take the form 
of the traditional or newly anticq>ated statistics and measurements. Rather than overlaying this information 
onto the continuing video portion of the broadcast 1000, the present inventors anticipate creating a "game 
metrics window" as a portion of the entire screen that will display information primarily is textual form 
directiy finbm either the performance measurement & analysis database 701 or its historical database 701a. 
Hie decision on the types of information to display and their format is carried in the viewer profile & 
preferences database 951. 



wo 2005/0^423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-102- 

And finally, with respect to the audio portion of broadcast 1000, the present inventors prefer using the 
vokune & tonal maps 402b as interpreted via tiie audio map translation niles 403a in order synthesize a 
recreation of the original stadium sounds. Again, viewer profile & preferences 951 are used to indicate 
\rfielher the view wishes to hear the original sounds intact or a "filled-to-c^acity" recreation. As 
previously discussed, game commentary can also be added to broadcast 1000 by processing performance 
descriptors 702 along with the historical database 703b via translation rules 703a, Hie present inventors 
anticipate that rules 703a in conjunction with the viewer profile & preferences 951 will at lest govern the 
choices and inqjlementation of: 

• the commentator's voice, that is effectively embedded in the text-to-speech engine, as will be 
imderstood by those skilled in the art, 

• the e;q)ression st^es, such as for children, you^ or adults, and 

• the level of detail in tiie commentary. 

Historical descriptor database 703b is anticipated to very helpful in keeping the speech fi:esh by making 
sure that certam speech patterns are not overly repeated unless, for instance, they represent a specific 
commentator's style. 

The end result of the entire decoding process discussed in detail for profile C3 904pC3 and inched in 
general for tiie remaining profiles and any other possible combinations of the datasets taught in the present 
application, is the creation of a broadcast 1000 representing video 904v, metrics 904m and audio 904a 
information. 

Conclusion and RamificatiDns 

TbQ above stated obj ects and advantages are to be taugjit in cooperation in the present invention, thereby 
disclosing the elements of a complete Automatic Event Videoing, Tracking and Content Generation 
System However, the present inventors recognize that specific elements are optional and either would not 
be required under certain circumstances or for particular sports. It is also noted that removal of these 
optional elements does not reduce ttie novel usefiihiess of the remaining aspects of the specificatioiL Such 
optional elements include: 

1. The automatic game filming system 200 if perspective view game filni is not desired; 

2. Hie interfoce to manual game filming 300 if manual game filtmng cameras will not be used: 

3. The spectator tracking & filming system 400 if additional video and audio fix)m the spectators is 
not desired to enhance the broadcast; 

4. The player & referee identification system (using jersey numbers) 500 if other techniques such as 
hehnet stickers 9a or helmet transponders 9t are used to identify participants; 

5. The game clock and official scoring inter&ce system 600 if it is preferred that operators 613 
control the game clock and scoreboard; 

6. The performance measurement & analysis system 700 if only time synchronized game film is 
desired; 



wo 2005/099423 



GA 02563478 2006-10-16 



PCT/US2005/013132 



-103- 

7. The inter&ce to perfomiance connnentators 800 if game commentators are not present or it is not 
desired that their comments be added to the broadcast; 

8. The oveifaead image database 102 if overhead game film is not desired, and 

9. The encoded broadcast 904 and broadcast decoder 950 if the broadcast is to be generated live and 
presented locally without need for compression and transmission to a remotely networked or 
connected system. 

What is preferred and first claimed by die present inventors is the mintmnm configuration e;q)ected to be 
necessary to create a meaningfiil and enjoyable broadcast including: 

1 . The tracking system 100 with both the tracking database 101 and overhead image database 102; 

2. The automatic game filming system 200; 

3. The performance measurement & analysis system 700, and 

4. The automatic content assembly & conqnession system 900 without encoded broadcast 904 and 
broadcast decoder 950. 

The combined elements of this minimum configuration are anticipated to provide: 

1. Game film taken fix>m the overhead view including the adjacent team bench, penalty waiting and 
entrance / exit areas that, at least for Ihe indoor sport of ice hockey, is currently only available at 
the professional level vAiero the arena stmcture allows for ceiling cameras hundreds of feet above 
the playing sur&ce; 

2. Game film taken fix>m at least one perspective view that is automatically adjusted to follow either 
the contest's center-of-play, or any center-of-interest, diat is currently only available firom systems 
that en^loy electronic transponders afGxed to the gaine object or one or more participants; 

3. Real-time digital measurements of key game activities including participant and game object 
locations and orientation, providing the basis for the automatic generation of statistics, the 
detection of specific events and the assessment of participant performance that is currently 
unavailable in fiill and only in partially available via location tracking based upon electronic 
transponders afGxed to the game object or one or more participants, and 

4. An integrated multi-media presentation of all game film synchronized at least by both time and 
detected game events tiiat are currentiy ordy available through the use of film collection systems 
that accept operator based judgments to define game events. 

The remaining optional elements add to the following provisions: 

5. Game film taken by automatically controlled but manually directed filming cameras allow for 
operator choice of perspective views that can be combined with the automated system choices; 

6. Video taken and audio recorded of the spectators including coaches, team benches and fims 
From the foregoing detailed description of the present invention, it will be parent that the invention has a 
number of advantages, some of yMch have been described above and others that are inherent in the 
inventioa Also, it will be apparent that modifications can be made to the present invention without 
departing bom die teachings of the invention. Accordingly, the scope of the invention is only to be limited 
as necessitated by tiie acconq)anying claims. 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-104- 

Brief Description of the Drawings 

Fig. 1 is a block diagram depicting the major sub-systems of the Automatic Event Videoing, Tracking and 
Content Generation System, including: a tracking system, an automatic game filming system, an inter&ce 
to manual game filming, an automatic spectator trackmg & filming system, a player & referee 
identification system, a game clock inter&ce system, and a performance measurement & analysis system, 
an inter&ce to performance commentators, an automatic content assembly & con:q>ression system as well 
as a broadcast decoder. 

Fig. 2 is a top view drawing of the preferred embodiment of the tracking system in the example setting of 
an ice-hockey rink, depicting an array of overhead X-Y tracking / filming cameras that when taken 
together form a field of view encompassing the skating and bench area of a single ice surface. Also 
depicted are perspective Z tracking cameras set behind each goal, as well as automatic pan, tilt and zoom 
perspective filming cameras. 

Fig. 3 is a combined drawing of a perspective view of the array of overhead X-Y tracking / filming 
cameras herein a single camera has been broken out into a side view depicting a single tracking area in 
which a player is being tracked. Along with the depicted tracking camera is an associated filming camera 
that is automatically directed to follow the player based upon the tracking information collected by the 
overhead array. The player side view has then been added to a top view of a sheet of ice showing multq)le 
players moving &om the entrance area, on to and around the ice and then also onto and off the player^s 
benches, all within the tracking area. 

Fig. 4a is a perspective drawing depicting the preferred visible light camera that is cq>able of viewing a 
fixed area of the playing sur&ce below and gathering video frames that ^en analyzed reveal the moving 
players, equipment, referees and pudc 

Fig. 4b is a top view depiction of a key element of the process for efGciently extracting the foreground 
image of the player being tracked by tracing around the outline of the moving player that is formed during 
a process of conq>aring the current c^tured image to a pre-known background. 
Fig* 4c is a top view of a portion of an ice arena showing a series of tracked and extracted motions of a 
typical hockey player, stick and puck by the oveihead X-Y tracking / filming cameras depicted in Fig. 4a, 
Fig. 5a is a block diagram depicting the preferred embodiment of the tracking system conqprising a first 
layer of overhead tracking / filming cameras that capture motion on the tracking sur&ce and feed full 
fi:ame video to a second layer of intelligent hubs. By subtracting pre-know backgrounds, the hubs extract 
fiom the full fi:ames just those portions containing foreground objects. Additionally, the hubs create a 
symbolic representation fix)m the extracted foreground, after which the foreground object and symbolic 
representation are feed into an optional third layer of multiplexers. The multiplexers flien create separate 
streams of foreground objects and dieir corresponding symboUc representations which are passed on to &e 
automatic content assembly & compression system and the tracking system, respectively. 
Fig. 5b is a graph depicting the sinusoidal waveform of a typical a 60 Hz power line as would be found in 
a normal building in North America such as a hockey rink. Also depicted are the lamp discharge moments 
fhat are driven by the rise and fell of the power curve. And finally, there is shown the moments whm the 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-105- 

camera shutter is ideally activated such that it is synchronized with the mayimntn acceptable range of 
ambient lighting corresponding to the lamp discharge moments. 

Fig. 5c is a graph depicting the sinusoidal waveform of a typical 60 Hz power line thal.has been clipped on 
every other cycle as a typical way of cutting down on the integrated emission of the lighting over a given 
time period; essentially dimming tibte ligjits. Synchronized witii the clqjped waveform are the camera 
shutter pulses thereby ensuring that the cameras are filming under "full illumination^" even when the 
ambient lighting appears to the hmnan eye to have reduced. 

Fig. 6a depicts a series of images representing the preferred method steps executed by tiie intelligent hubs 
in order to successfully extract a foreground object, such as a player and his stick, fiom a fixed background 
that itself undergoes slight changes over time. 

Fig. 6b depicts tiie breakdown of a foregroimd image into its pre-known base colors and its remaining 
grayscale overlay. Botii the pre-known base colors and the grayscale overlay can optionally be represented 
as a "patch-woik" of individual single color or single grayscale areas. 

Fig. 6c depicts the same foreground image shown in Fig. 6b being first broken into distinct fiames. The 
first fi:ame represents the minimum area known to include tiie player's viewable &ce with all other pixels 
set to nulL The second fi:ame includes Ae entire foreground image as found in Fig. 6b except diat the 
pixels associated with the players &ces have been set to nuH 

Fig. 6d depicts the same separated minimum area known to include the player's viewable &ce as a stream 
of successive fi^mes show in Fig. 6c in which the filming camera happened to be first zooming in and then 
zooming out Also shown is a carrier fi:ame use to normalize in size via digital e:iq>ansion all individual 
successive firames in the stream. 

Fig. 6e depicts the same stream of successive firames show in Fig. 6d except that now each firame has been 
adjusted as necessary in order to fit tiie normalized earner firame. 

Fig. 6f depicts the preferred hehnet sticker to be afOxed to individual players providing both identity and 
head orientation information. 

Fig. 7a - 7d are flie same top view depiction of three players skating within the field-of-view of four 
adjacent cameras. Fig. 7a shows the single extracted foreground block created fiom the view of the top left 
camera. Fig. 7b shows the two extracted foreground blocks created fix)m the top right camera. Fig. 7c 
shows the single extracted foreground block created horn the bottom lefi camera, and Fig. 7d shows the 
three extracted foreground blocks created fiom the bottom ri^ camera. 

Fig. 7e shows the same top view of three players as shown in Fig. 7a - 7d, but now portrays them as a 
single combined view created by joining the seven extracted foreground blocks created fit)m each of flie 
respective four individual camera views. 

Fig. 8 shows two side-by-side series of exan^)le transformations of an original current image into a 
gradient image (player outlines) and then a symboHc data set that eventually provides information 
necessary to create meaningful graphic overlays on top of the original current image. 
Fig. 9a is a side view drawing of a single overhead tracking camera looking down on a specific area of the 
tracking sur&ce. On tfie leftmost portion of the sur&ce there are two players; one standing and one lying 



CA 02563478 2006-10-16 

WO 2005/099423 PCT/US2005/013132 



-106- 

on the ice surfece. The hehnet of both players ^pears to be at &e same '*XHi** location in the captured 
image due to the distortion related to die angled camera view. On the rightmost portion of the surface there 
is a single player whose hehnet» and identifying helmet sticker, is just straddling the edge of the cameras 
field-of-view. 

Fig. 9b is a side view drawing of two adjacent overhead tracking cameras A and each camera is shown 
in two to three sq>aTate locations, i.e. Position 1, 2 and 3, providing four distinct overlapping strategies for 
viewingtfae tracking sur&ce below. Also dq)icted are two players, each wearing an identifying helmet 
sticker and standing just at the edge of each camera's field-of-view. 

F^. 9c is a side view drawing detailing a single area of tracking sur&ce in sinoultaneous calibrated view of 
two adjacent overhead tracking cameras A and B. Associated with each camera is a current image of the 
camera's view as well as a stoned image of the known background; all of ^^ch can be related to assist in 
the extraction of foreground objects. 

Fig. 9d is identical to Fig. 9c except that players have been added causing shadows on the tracking sur&ce. 
These shadows are shown to be more easily determined as background information using the associated 
current images of overlapping cameras as opposed to the stored image of the known background. 
Fig. 9e is identical to Fig. 9d except that the players have been moved further into view of both cameras 
such that they block the view of a selected background location for camera B. Also, each camera now 
contains three associated images. Besides the current and stored background, a recent average image has 
been added tiiat is dynamically set to the calculated range of luminescence values for any given 
background location. 

Fig. 9f is identical to Fig. 9e except that the players have been moved even further into view of both 
cameras such that they now block the view of selected background locations for both camera B and camera 
A 

Fig.'s 10a diroug^ lOh in series d^ict the layout of overhead trackii^ camera assembUes. The series 
progresses from the simplest layout that minimizes the total number of cameras, to a more complex layout 
that includes two con:^>letely overtyping layers of cameras, wherein the cameras on each layer further 
partially overlap each other. 

Fi^. 11a is a combination block diagram depicting the automatic game filming system and one of the 
caufieras it controls along with a perspective view of the camera and a portion of an ice hockey rink. The 
resulting apparatus is cq>able of controlled movement synchronized with the cyturing of imz^es, thereby 
limiting viewed angles to specific pan / tilt and zoom increments while still providing the desired frames 
per second image capture rate. Synchronized images cy tured at these controlled filming angles and d^ths 
may then be coUected before a contest begins, thus forming a database of pre-known image backgrounds 
for every allowed camera angle / zoom setting. 

Fig. lib is identical to Fig. 11a excqpt that it further includes two overhead tracking cameras shown to be 
simultaneously viewing the same area of the tracking sur&ce as the perspective view game filming camera. 
Also depicted are current images, background images and average images for each overhead camera as 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-107- 

well as the resulting three-dhnensional topological infonnation created for one of the playets in 
simultaneous view of both the overhead and perspective cameras. 

Fig. He is identical to Fig. lib except that it further includes a projection onto the cunent view of the 
perspective filming camera offbe three-dimensional topological information determined by tiie overhead 
cameras. 

Fig. Hd is a top view diagram depicting the view of the perspective filming camera shown in Fig.'s 11a, 
lib and 11c as it c^tures an image of a player. The player's profile is also shown as would be calculable 
based upon image analysis fiom two or more overhead cameras. Also shown is a representation of tiie 
cunent image captured &om the perspective filming camera wi& the calculated profile overlaid onto the 
calibrated pixels. 

Fig. He is identical to Fig. lid except that Ihe boards are also depicted directly behind the player. Just 
beyond the boards, and in view of the perspective filming camera, are depicted three spectators that are out 
of the playing area and form the moving background The image of the player has also been shown on top 
of the calculated profile witiiin the current image. 

Fig. llf is an enlarged depiction of the current image of the perspective filming camera as shown in Fig. 
1 le. The image is shown to consist of two distinct areas, die fixed background called Area F representing 
for instance the boards, and the potentially moving background called Area M representing for instance the 
view of die crowd through the glass held by tiie boards. Overlaying these two areas is a third area created 
by the calculated player profile called Area O because it contains any detected foreground object(s) such as 
the player. Also shown is tiie separation of the current image into its two sections of Area M and AreaF. In 
both separated images. Area O with a foreground object is present Also shown is Region OM representing 
just that portion of Area M enclosed within die overlaying calculated player (foreground object) profile. 
Fig. llg is similar to He except that a con^ianion stereoscopic filming camera has been added to work in 
conjunction witii the perspective fihning camera. Also, flie arm of the player in the view of the filmmg 
camera has been lifted so that it creates an area witiiin region OM where a significant portion of the 
moving background spectators can be seen. The second stereoscopic filming camera is primarily used to 
help support edge detection between the player and the moving background spectators within region OM. 
Fig. 1 Ih is an enlarged depiction of Area M of the current image of the perspective fitoing camera as 
shown in Fig. llg. Area M is depicted to have two distinct types of player edge points; one in view of the 
overhead assemblies and essentially "exterior" to the player's upper sur&ces and tiie other blocked from 
the view of the overhead assemblies and essentially "interior^* to tiie player's surfaces. 
Fig. Hi is an enlarged depiction of Region OM with Area M as shown in Fig. llh. 
Fig. llj expands upon the fihning camera with second stereoscopic camera as shown in Fig. llg. Also 
shown is a second conqmnion stereoscopic camera such that the filming camera has one companion on 
each side. The main purpose for the stereoscopic cameras remains to he^ perform edge detection and 
sq)aration of the foreground object players fixim flie moving background object spectators. Also shown in 
ttiis top view is a expanded portion of flie tracking area which is this case is a hockey rink enckcled by 
boards and glass, outside of which can be seen moving background spectators. An additional ring of fixed 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-108- 

oveihead perspective cameras has been added specifically to image tiie moving background areas just 
outside of the tracldng area where spectators are able to enter the view of the perspective fihning cameras. 
The purpose of these fixed overhead background fihning cameras is to provide additional image 
information useful for the separation of foreground players fiom moving background spectators. 
Fig. 12 is a combination block diagram depicting the inter&ce to manual game filming system and one of 
the fixed manual game filming cameras along with a perspective view of the fixed camera as it captures 
images. The resulting ^paratus is c^)able of either detecting the exact pan / tilt angles as well as zoom 
depths at the moment of image c^ture or limiting die moment of image c^ture to an exact pan / tilt angle 
as weU as zoom depth. These sensed and c^tured angles and depths allow the automatic content assembly 
& compression system to coordinate the tracking information collected through the ovethead X-Y tracking 
/ filming cameras and processed by the performance measurement & analysis system widi any film 
collected by manual effort This coordination will resuh in die potential for overlaying graphic information 
onto the existing manual game broadcast as well as the abihty to determine which additional viewing 
angles have been collected by the manual camera operator and may be of interest to mix with the 
automatically captured filnL 

Fig. 13 is a combination block diagram depicting the interface to manual game fihning system and one of 
the roving manual game filming cameras along widi a perspective drawing of a cameraman holding a 
roving manual camera. The roving camera's current location and orientation is being tracked by local 
positioning system (LPS) transponders. Hiis tracked location and orientation information allows the 
automatic content assembly & conqiression system to coordinate die tracking information collected 
through the oveihead X-Y tracking / fihning cameras and processed by the performance measurement & 
' analysis system with any film coUected by roving manud effort This coordination wiUre^ 
potential for overlaying graphic information onto die existing manual game broadcast as well as the ability 
to determine which additional viewing angles have been collected by the manual camera operator and may 
be of interest to mix widi the automatically c^tured film. 

Fig. 14 is a combination block diagram depicting die player & referee identification system and one of the 
identification cameras along with a perspective drawing of a player on the tracking sur&ce. The player's 
current location and orientation with respect to the hockey rink axis are being used to automatically direct 
at least one ID camera to C2q>ture images of the back of his jersey. These zoomed-in c^tures of the jersey 
numbers and names are then pattemed match against the database of pre-known jerseys for the current 
game resulting in proper identification of players and referees. 

Fig. 15 is the quantum efGciency curves for two commercially available CMOS image s^isors. The top 
curve shows a monochrome (grayscale) sensor's abihty to absorb hght while die bottom curve shows a 
color sensor. Both sensors have significant abihty to absorb non-visible firequencies at least in the near 
infi:ared region. 

Fig. 16a, 16b and 16c are various sensor arrangements and in particular Fig. 16a shows a typical 
monochrome sensor. Fig. 16b shows a typical color sensor and Fig, 16c shows an alternate monochrome / 
non-visible IR sensor. 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



.109- 

Flg. 16d shows a tbree element CMOS camera where the Hgjit entering through the leas is spEt into three 
directions simultaneously iniqpacting two monochrome sensors and a combined monochrome / IR sensor. 
The visible frequencies of 400 to 500 nm (blue) are directed to the first monochrome sensor; flie visible 
ftequencies of 500 to 600 imi (green) are directed to tiie second monochrome sensor; and the visible and 
near IR fiequencies of 600 to 1000 nm are directed to the monochrome / IR sensor. This solution 
effectively creates a color camera with IR imaging c^ability. 

Fig. 16e shows a two element CMOS camera where the lig^t entering through the lens is split into two 
directions simultaneously impacting both color and monochrome sensors. The visible frequencies of 400 to 
700 nm are directed to the color sensor while the non-visible near IR fi:equencies of 700 to 1000 nm are 
directed to the monochrome sensor thereby creating two overlapping views of a single image in both the 
visible and non-visible regions. 

Fig. 16f is exactly similar to Fig. 16e except that both sensors are monochrome. 

Fig. 17 depicts a series of tiuee steps that process the combination visible image and non-visible image. In 
step one the current visible image is extracted fit)m its background. In step two the extracted portion only 
is used to direct a search of the non-visible image pixels in order to locate contrast variations created in ihe 
non-visible IR frequencies due to the addition of either IR absorbmg or IR reflecting or retro-reflecting 
marks on the foreground objects. In step three die determined hi^ contrast markings are converted into 
centered points which may then be used to create a continuous point model of the body motion of all 
foreground objects. 

Fig. 18 depicts the various types of cameras used in &e present invention including the overhead tracking 
cameras, die player identification cameras, the player filming cameras and the player filming and tiiree- 
dimensional imaging cameras. 

Fig. 19 is a combination block diagram depicting die spectator tracking & filming system along with a 
perspective drawing of a hockey rink and mainly of the spectators outside of the tracking sur&ce such as 
other players, the coach and &ns. Also depicted are the processing elements that control die tracking of the 
location of these spectators as well as their automatic filming. This part of the present invention is 
responsible for c^turing the spectator audio / video database whose images and sounds may then be used 
by the automatic content assembly & conq)ression system to conibine into a more conq>ete multi-media 
recreation of the game as a thematic story. 

Fig. 20 is a combination perspective drawing of the hand of an ice hockey referee that has been outfitted 
with a combination puck-drop pressure sensor and whistle air-flow sensor along with the game control 
computer and / or game controller box that work to manually control the game scoreboard. By 
automatically sensing the puck-drop and therefore play start-time along with die v^de air-flow and 
dierefore play stop-time, the game clock inter&ce system is able to bofli automatically operate die gains 
clock and indicate key timing information to the tracking and game filming systems. 
Fig. 21 is a block diagram depicting the side-by-side flow of information starting with the game itself as it 
is then subjectively assessed by the coaching staff and objectively assessed by the performance 
measurement and analysis system. This side-by-side flow results in die ultimate con^)arison of die 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-110- 

subjective aad objective assessments thereby creating a key feed-back loop for both the coaching staff and 
performance measurement and analysis system. 

Fig, 22 is a series of perspective view representations of the overall method embodied in tiie present 
application for the capturing of current images, the extraction of the foreground objects, and die 
transmission of these Tmnimal objects to be later placed on top of new backgrounds with potentially 
inserted advertising. 

Fig. 23 is two side-by-side series of overhead images designed to illiistrate the bandwidth savings taught in 
the present invention that is predicated on the extraction of foreground objects fix»m the background of the 
current image. A third series is also shown that represent a symbolic dataset of extracted foreground object 
movement vectors and related measurements. 

Fig. 24 shows die same two side-by-side series of overhead images found in Fig. 23 from a perspective 
view so as to accentuate both the reduction in transmitted information and the change from a fixed to 
variable transmission frame. 

Fig. 25 shows the new condensed series of overhead images as portrayed in both Fig. 23 and Fig. 24 in 
botii its original and converted formats, hi the converted format, efach sub-frame is rotated and centered 
within a carrier frame. 

Fig. 26 shows the rotated and centered series of Fig. 25 wbsirQ each sub-frame has been additional 
"scrubbed" of any detected back^und pixels thereby maximizing its compression potential. 
Fig. 27 is a perspective view of a filming camem as it captures background images prior to a game. These 
images are than ^ypended into a larger panoramic background database as opposed to be stored 
individually. The database is keyed and accessible by the pan and tilt angles of the filming camera which 
are both set to be multiples of a minimimi increment 

Fig. 28 is identical to Fig. 27 accept that it illustrates the impact of zooming the filming camera at any 
given pan and tilt angle. 2k)oming factors are purposely restricted in order to ensure that any individual 
pixel, for any given zoom setting, is always a whole multiple of the smallest pixel c^tured at the highest 
zoom setting. Furthermore, the movement of the filming camera is purposely restricted so that any 
individual pixel, for any given zoom setting, is also centered about a single pixel captured at the highest 
zoom setting. 

Fig. 29a depicts the flow of relevant video and tracking information fix>m its origin in the overhead 
tracking system to its destination in the broadcast encoder. Tliere are three types of datasets shown. First, 
there are a majority of datasets (shown as light cylinders) representing the evolution of the "current frame" 
from its raw state into it final segmented and analyzed form. Second, there are three datasets (shown as the 
darkest cylinders) representing "pre-known" or '^per-determined" information that is critical for the process 
of segmenting the "current frame" into its desired parts. And finally, there are four datasets (shown as the 
medium toned cylinders) representing "accumulated fiiame and analysis" information that is also critical 
for the segmenting of tiie "'current frame." 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-Ill- 
Fig. 29b depicts a similar flow of relevant video and audio informatioii &om its origin in the automatic 
filming and audio recording systems to its destination in the broadcast encoder. The ligjit, dark and 
medium tone cylinders have similar meaning as in Fig. 29a. 

Fig. 29c depicts five distinct combmations of encoded dalasets referred to as profiles. The datasets 
enconq)ass video, metrics (tracking information,) and audio. The least segmented Profile A contaios the 
uiq)n>cessed current stream of video information and audio recordiags, similar to the input into today's 
current encoders, such as MPEG2, MPEG4, H J64, etc. The most segmented Profile C3 consists of various 
translated sub-portions of the current stream of video and audio as discussed in Fig.'s 29a and 29b. The 
increase in the segmentation of the encoded data is anticipated to yield increased data con^ression over 
current methods working on the sinq)lest first profile. 

Fig. 29d depicts the four segmented. Profiles B througjr C3, out of the five possible shown in Fig. 29c. 
Each of these four is optionally accepted by the broadcast decoder in order to reverse into a current stream 
of images, associated audio and relevant metrics (tracking informatioiL) Similar to Fig.'s 29a and 29b, 
dark cylinders represent pre-known datasets, medium tone cylinders represent accumulated information 
while light cylinders represent current data being reassembled in order to create the final broadcast video. 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-112- 

Claims 

Claim 1 : A system for c^turing video of a scene fhrough one or more can[ieras and creating a first video 
database of extracted foreground blocks minimally representing all foreground objects present and within 
the cameras field(s)-of-view, conq>rising: 

an arrangement of one or more fixed first cameras, where Ihe X, Y, Z location of each firat camera with 
respect to the scene is calibrated and known to the system, where each first camera's subsequent X, Y, Z 
projection of its field-of-view onto the scene is fimher calibrated and known to the system, and where each 
first camera is both synchronized to an external trigger and C£^)able of capturing full fi:ames of pixels 
representing its field-of-view; 

a first algorithm operated by conq>uting elements, and initiated either by an external trigger or in response 
to a stored clock time, for directing each fixed first camera to c^ture at least one fijU fiame of the scene, 
where each fiiU fi:ame serves as a background image of that camera's particular field-of-view before fiie 
entrance of any foreground objects, and ^^ere the background image is output to at least the confuting 
elements for operating a third algorithm; 

a second algorithm operated by confuting elements, and initiated either by an external trigger or in 
response to a stored clock time, for subsequently directing each fixed first camera to begin capturing 
ongoing fiiU firames of the scene, where each fiiU &dinc serves as the next current image of that camera's 
particular field-of-view before, during or after to the entrance of any foreground objects, and where tiie 
next current images are output to at least the conq)uting elements for operating a third algorithm; 
a third algorithm operated by conq)uttng elements, for conqiaring each next current image, fi^m each fixed 
first camera, against each same camera's prior c^tured background image in order to locate every distinct 
contiguous group of a minimum number of deteiminable foregroimd pixels, where each group is &en 
extracted as a minimum set, preferably in rectangular block format with associated information indicating 
each block's originating first camera as well as its relative row colunm coordinates within its original fiill 
fi:ame, and where each extracted foreground block and associated information are then output to at least the 
computing elements for operating a forth algorithm, and 

a forth algorithm operated by conqsuting elements, for combining and optionally conq)ressing the output of 
the third algorithm, or some derivative tiiereo^ into a first video database. 

Claim 2: The system of claim 1, where the scene is substantially illiuninated by one or more alternating 
current driven Ught sources, fiirther con:q>rising: 

ii^uts for accepting alternating current consistent with that provided to the light sources and circuits for 
converting the sinusoidal waveform of the altemating current into an internal trigger wave, \^ere the 
trigger wave is synchronized to some fiiU or firactional beat of the lighting discharge cycle, and for 
providing this int^nal trigger wave to computing elements of both the first and second algorithms to act as 
the external trigger wave. 

Claim 3: The system of claim 1, fiirtiier conqsrising: 

additional capability within the tiiird algorithm to set all determined background pixels within the extracted 
foreground blocks to some recognizable, preferably null value. 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-113- 

Claim 4: The system of claim 1, fuifher co]iq)risi^^ 

additional capability within the third algorithm for continuously using the newly determined background • 
pixels in the ciurent image to update liie corresponding pixels in tihie corresponding background image. 
Claim 5: The system of claim 1, further conqnising: 

additional cq>ability witfiin the forth algoridun to optionally represent the unique color information of each 
foreground pixel or block of pixels as a single minimal bit code representing a specific combination(s) of 
traditionally recognized color valises such as U, V or H, S, where the potential codes are pre-established in 
a color table available to the algorithm prior to the csptming of images, and where the potential codes only 
represent those distinct U, V or H, S colors pre-known to ^dst, or predominantly exist, within the 
anticipated foreground objects. 
Claim 6: The system of claim 1, further conqprising: 

additional capability within the forth algorithm to optionally represent the foreground pixels in either tiie 
original image sensor format such as the Bayer pattern, or as separate color information in one of the 
following formats: 

the U, V, or H, S or similar traditioiial encoding of the origirial seiisor foniuit 
for either each and every pixel or blocks of pixels, or 

a set of color underlay regions defined to be all contiguous pixels, exceeding some 
minimum count, within a set range of one or more of the original sensor or traditional color 
values, ^ere each region is ^coded as some representation of its encon[q)assiag edge plus a 
singjle representation of the interior region's assigned color in a format such as U, V or H, S; 
and as separate grayscale (luminosity) information in one of the following fomoats: 

- the Y or I or similar traditional encoding of the original sensor format designated for 
either each and every pixel or blocks of pixels, or 

a set of grayscale overlay regions defined to be all contiguous pixels, exceeding some 
minimum count, within a set range of one or more of the original sensor or traditional grayscale 
values, where each region is encoded as some representation of its enconq)assing edge plus a 
single representation of the interior region's assigned grayscale in a format such as Y or 1. 
Claim 7: The system of claim 6, further conq)rising: 

additional capability within tbe forth algorithm to optionally represent the unique color of each foreground 
pixel, block of pixels or region of pixels as a single minimal bit code representing a specific 
combination(s) of traditionally recognized color values such as U, V or H, S, where the potential codes are 
pre-established in a color table available to the algorithm prior to the c^turing of images, and where the 
potential codes only represent those distinct U, V or H, S colors pre-known to exist, or predominantly 
exist, within the anticipated foreground objects. 

Claim 8: Hie system of claim 1, where the arrangement of fixed first cameras comprises: 

two or more fixed first cameras that have been arranged as a first grid so that their combined fields-of-view 

form a single contiguous and substantially overhead view of the scene, and where the first grid's cameras 



wo 2005/099423 



CA 02563478 2006-10-16 



PCTAJS2005/013132 



-114- 

are further arranged so that all adjacent fields-of-view overlap at a heigjbt in excess of at least the tallest 
expected free-standing foreground object to become present in the scene. 
Claim 9: The system of claim 8 that additionally and concurrently creates a database of tracking 
information relating to the foreground objects, further conq}nsing: 

a fiflh algorithm operated by conq>uting elements, and capable of receiving &e ou^ut of the tiiird 
algorithm, for detecting within all extracted foreground block pixels the presence of object shapes, where 
any and all recognizable sh^>es are translated into symbolic representations potentially including 
information describing approximate sh^, such as fitted circles, ellipses, curves or rectangles, the relative 
X, Y locations of the sh^>es within the scene, the shape centroid and potentially its orientation, and where 
the symbolic representation information is then output to at least the computing elements for operating a 
sixth algorithm, and 

a sixth algorithm operated by contputing elements, for combining and outputtmg all symbolic 
representations and related information into a foreground object tracking database synchronized with the 
first video database. 

Claim 10: The system of claim 9, fiirdier conqjrising: 

one or more fixed second cameras that have been arranged so that titieir field(s)-of-view are at a perspective 
orientation to the scene, as opposed to the substantially overiiead placement of the first grid of fixed first 
cameras, where the X, Y, Z location of each second camera widi respect to the scene is calibrated and 
known to tiie system, where each second camera's subsequent X, Y, Z projection of its field-of-view onto 
the scene is further calibrated and known to the system, and where each second camera is both 
synchronized to an external trigger and enable of cq>turing full frames of pixels rq)resenting its field-of- 
view. 

Claim 1 1 : The system of claim 1 0, further conqxrising: 

devices attached to the perspective fixed second camera(s) that are capable of receiving and executing 
signals for controallably directiiig the panning, tilting aiul / or zooming of each carnera^ 
a seventh algorithm operated by computing elements, for inputting the foreground object tracking database 
from the sixth algorithm and using the foreground object tracking information to automatically provide 
signals for directing the parming, tilting and / or zooming movements of each perspective fixed second 
camera, in order to follow one or more foreground objects, where the cq>tured images from each 
perspective second camera are output as a stream of perspective video images along with some 
representation of each images associated pan angle, tilt angle and zoom depth and an indication of the 
c^turing camera's X, Y, Z field-of-view location. 
Claim 12: The system of claim 11, fiirdier comprising: 

sensing and control devices fitted to the perspective fixed second camera(s) that are enable of receiving 
and executing signals for precisely and accurately controlling to a repeatable increment each camera's pan 
angle, tilt angle and / or zoom depth, and 



wo 2005/099423 



CA 02563478 2006-10-16 



PCTAJS2005/013132 



-115- 

additional capability within the seventh algorithm to limit the signals for automatically directing the 
panning, tilting and / or zooming movements of each perspective second camera so that each c^tured 
image is taken at known and repeatable, pan, tilt and / or zoom increments. 
Claim 13: The system of claim 12, further conq)rising: 

an eighth algoritfun operated by computing elements, and initiated either by an external trigger or in 
response to a stored clock time, for directing each perspective second camera to c^ture at least one full 
frame of die scene, to serve as a background image of that camera's particular field-of-view, at some or all 
of die camera's total limited repeatable increment settings of pan angles, tilt angles, and / or zoom depths 
such that the camera's £eld-of-view is swept across the some or all of die enconpassed scene prior to the 
scene's occupation by foreground objects, and for creating and oulputting a background image database 
per each camera along with some representation of each images associated pan angle, tilt angle and zoom 
depth and an indication of the capturing camera's X, Y, Z field-of-view location. 
Claim 14: The system of claim 13, further comprising: 

additional capability within the seventh algorithm for assuring that each current image captured of the 
scene by each perspective second camera is at a combination of pan angle, tilt ang^e and zoom depths that 
either direcdy corresponds to an associated background image or is interporable between neighboring 
associated background images; 

a ninth algoridmi operated by computing elements, and capable of receiving bodi the outputs of the 
seventh and eig^di algorithms, for con^)aring in the original or modified formats, each next current video 
image from each perspective second camera, with either the direcdy associated, or an intetpolation of the 
neighboring associated background image(s), in order to extract from the next current video image only 
those blocks of pixels.pertaining to the foreground objects, where widiin each block all background pixels 
are optionally indicated or otherwise set to some null value, and for oulputting to at least the conqputing 
elements for operating an tenth algorithm only the extracted foreground blocks, or some derivative diereo^ 
along widi some representation of each block's associated current image row and column coordinates, the 
current pan angle, tilt angle and zoom depth, and the capturing camera's X, Y, Z field-of-view location, 
and 

a tendi algorithm operated by conq)uting elements, for combining and optionally conq>ressing the ou^ut 
of the ninth algorithm, or some derivative thereof, into a perspective second video database. 
Claim 15: The system of claim 15, furdier conprising: 

additional capsibiMty within the ninth algorithm for continuously using the newly determined background 
pixels in die current image to update die corresponding pixels in the corresponding background image. 
Claim 16: The system of claim 9, where at least one of the foreground objects diat are expected to be 
present within the first grid's field-of-view is a scene particq)ant, and where any such partic^ants have 
some externally viewable identifying marks, furdier conq)rising: 

additional cs^ability within die fifdi algorithm for detecting within the extracted foreground block pixels 
the presence and location of the identifying marks on each participant, and for subsequentiy direcdy 
detem:iining the interpretation of the markings, or using die markings to pattern match against a 



wo 2005/099423 



CA 02563478 2006-10-16 



PCTAJS2005/013132 



-116- 

piedetermiiied database of potential maikings with associated identities, in order to uniquely identify each 
particq>ant within the object tracking database. 
Claim 17: The system of claim 16, further conq>rising: 

one or more fixed third cameras that have been arranged so that their field(s)-of-view are at a perspective 
orientation to the scene, as opposed .to the substantially overhead placement of the first grid of fixed first 
cameras, where the X, Y, Z location of each third camera with respect to the scene is calibrated and known 
to the system, where each third camera's subsequent X, Y, Z projection of its field-of-view onto the scene 
is furtiier calibrated and known to the system, and where each third camera is both synchronized to an 
external trigger and caqpable of capturing full firames of pixels representing its field-of-view; 
sensing and control devices fitted to the perspective fixed third camera(s) that are c^[>able of receiving and 
executing signals for precisely and accurately controlling to a repeatable mcrement each camera's pan 
angle, tilt angle and / or zoom depth; 

an eleventh algorithm operated by computing elements, for inputting the foreground object tracking 
database fiom the sixth algorithm and using the foreground object tracking information to automatically 
provide signals for directing the panning, tilting and / or zooming movements of each perspective fixed 
third camera, in order to follow one or more foreground participants and c^ture images of their 
identifying marks, where die captured images of the identifying marics &om each perspective third camera 
are output to at least the computing elements for operating the fifth algorithm, and 
additional c^ability within the fifUi algorithm for detecting within the captured images 
ou^ut by the eleventh algorithm, the presence and location of the identifying marks on each participant, 
and for subsequentiy directly determining tiie interpretation of the markings, or using the markings to 
pattem match against a predetermined database of potential markings witii associated identities, in order to 
uniquefy identify each participant within the object tracking database. 
Claim 18: The system of claim 17, further comprising: 

a twelflh algorithm operated by conq>uting elements, capable of inputtmg the first video database, tiie 
perspective second video database and the obj ect tracking database and creating and outputting a camera- 
fiame database listing each identified participant and / or object visible within each successive c^tured 
image, for each first or second camera in use, along with the participant's and / or object's relative intra- 
firame coordinates at least including tiie sh^ centroid, as well as the distance finom the centroid to tiie 
camera, providing a means for selecting all image sequences containing selected particq)ants and / or 
objects. 

Claim 19: The system of claim 9, fiirtiier con:q}rising: 

two or more fixed first cameras that have been arranged as a second grid so that their combined fields-of- 
view form a single contiguous and substantially oveihead view of the scene, overlapping with and of&et 
fiiom the first grid, and where die second grid's cameras are further arranged so that all adjacent fields-of- 
view overlap at a heigjit in excess of at least tfie tallest expected free-standing foreground object to become 
present in the scene. 

Claim 20: The system of claim 19, further conqxrising: 



CA 02563478 2006-10-16 
WO 2005/099423 PCT/US2005/013132 



-117- 

additional capability within the fifth algorithm, for using the overl^ping extracted foreground blocks from 
Ihe first and second grids as ou^ut by the third algorithm, especially inchiding redmidant blocks 
containing additional views of the same object, to generate Z information via stereoscopic triangulation, 
for as many detected obj ects as are visible fiom two or more cameras, where the Z information is upended 
to the existing object tracking database that ^ready includes symbolic representations of all detected 
shapes and dieir ongoing X, Y locations with respect to the scene. 
Claim 21: Hie system of claim 20, fiirther conoprising: 

Arrangements of a third or more grids, where each additional grid contains two or more fixed first 
cameras, so that the combined fields-of-view of the fixed first cameras within each additional grid form a 
single contiguous and substantially o verhiead view of the scene, where the resulting entire combined field- 
of-view of each additional grid is overl^ping with and ofifeet fiom the first, second and eachother's grids, 
and where the additional grid*s first cameras are fijriher arranged so that all adjacent fields-of-view overlap 
at a heigjit in excess of at least the tallest Gxpected free-standing foreground object to become present in 
the scene. 

Claim 22: The system of claim 21, fiirther comprising: 

additional capabihty within the fiflh algorithm, for using the overlapping extracted foreground blocks fiom 
the first, second, third and more grids as ou^ut by flie third algoriflmi, especially mcluding redundant 
blocks containing additional views of the same object, to generate Z information via stereoscopic 
triangulation, for as many detected objects as are visible fiiom two or more cameras, where the Z 
information is appended to the existing object tracking database that already includes symbohc 
representations of all detected shapes and tiiehr ongoing X, Y locations with respect to the scene. 
Claim 23 : A device for inputting multiple video streams of fiill ftames fiom two or more external 
connected cameras, i^ere a fiill fi:ame is made ixp of all of the pixels read fiom a camera sensor 
representing that camera's field-of-view, and conibining the multiple streams into a single ou^ut stream of 
extracted video blocks, where all extracted blocks represent only those pixel portions of each of the 
multiple fiill fiames that contam a foreground object as opposed to tiie fixed background of tiie scene, and 
where associated with each extracted video block is related information including some representation of 
die originally capturing camera and tiie block's row and column coordinates within tiie origmal fiiU fiame, 
and where at some time the fixed background was in view of tiie cameras before the prc-known or detected 
entrance of the foreground objects, conqfirising; 

a first algorithm operated by computing elements witiiin the device, and initiated eitiier by an external 
trigger or in response to a stored clock time, for directing each cormected camera to c^ ture at least one fiill 
fiame of the scene, where each fiiU fiame serves as a background image of tiiat camra's particular field- 
of-view before the entrance of any foreground objects, and where the camera outputs the background 
image to the device; 

a second algorithm operated by conq)uting elements within the device, and initiated either by an external 
trigger or in response to a stored clock time, for subsequently directing each fixed camera to begin 
simultaneously capturing ongoing fiill fiames of the scene, where each fiiU fiame serves as a current image 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-118- 

of that fixed camera's particular field-of-view before, during or after to the entrance of any foreground 
objects, and where the camera ou^uts the current images to the device; 

a third algoridun operated by con^}uting elements within the device for conq>aring each next cutient 
image, fiom each connected camera, against each camera*s prior captured background image in order to 
locate every distinct contiguous group of a minininTn number of determinable foreground pixels, \^ere 
each group is then extracted as a minimum set, preferably in rectangular block format, and output to at 
least ttie congjuting elements for operating a forth algorithm, as a stream of extracted foreground blocks 
along with the associated information of the originating camera as well as the row and column coordinates 
withrespect to the block's original full image, and 

a forth algorithm operated by computing elements within the device for combining and optionally 
compressing the output of the third algoritiim, or some derivative thereof, into a single output streanL 
Claim 24: The device of claim 23, where the scene is substantially illuminated by one or more altemating 
current driven ligjht sources, further conq)rising: 

irq>uts for accepting altemating current consistent with that provided to the Ught sources and circuits for 
converting the sinusoidal waveform of the altemating current into an'intemal trigger wave, ^ere the 
trigger wave is synchronized to some fiill or fractional beat of die Ughting discharge cycle, and for 
providing this internal trigger wave to conq)uting elements of both the first and second algorithms to act as 
tiie external trigger wave. 

Claim 25: The device of claim 23, further conq}rising: 

additional capabihty within the third al gorithm to set all determined background pixels within the extracted 
foreground blocks to some recognizable, pre&rably null value. 
Claim 26: The device of claim 23, further coniprising: 

additional capability within the third algorithm for continuously using the newly determined background 
pixels in die current image to update the corresponding puels in the backgroimd image. 
Claim^27: The device of claim 23, further conqjrising: 

additional capabihty within the forth algorithm to optionally represent the unique color information of each 
foregroimd pixel or block of pixels as a single minimal bit code representing a specific combination(s) of 
traditionally recognized color values such as U, V or H, S, wbsr^ the potential codes are pre-established in 
a color table transmittable to the device prior to the cq)turing of images, and where ttie potential codes 
only represent those distinct U, V or H, S colors pre-known to exist, or predominantiy exist, within the 
anticipated foreground objects. 
Claim 28: The device of claim 23, further conq)rising: 

additional capabihty within the forth algorithm to optionally represent the foreground pixels in either the 
original image sensor format such as the Bayer pattern, or as separate color infomoiation in one of the 
following fomiats: 

- the U, V, or H, S or similar traditional encoding of the original sensor format designated for either 
each and every pixel or blocks of pixels, or 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-119- 

a set of color underlay regions defined to be all contiguous pixels, exceeding some mitiiTrmm 
count, within a set range of one or more of the original sensor or traditional color values, where 
each region is encoded as some representation of its encompassing edge plus a single 
representation of the interior region's assigned color in a format such as U, V or H, S; 
and as separate grayscale (luminosity) information in one of tiie following formats: 

file Y or I or similar traditional encoding of the original sensor format designated for either each 
and every pixel or blocks of pixels, or 

a set of grayscale overlay regions defined to be all contiguous pixels, exceeding some miTiiTnpiTi 
count, within a set range of one or more of flie original sensor or traditional grayscale values, 
where each region is encoded as some representation of its enconq)assing edge phis a single 
representation of the interior region ' s assigned grayscale in a format such as Y or L 
Claim 29: The device of claim 28, further coniprising: 

additional cs^ability within the forth algorithm to optionally represent the unique color of each foreground 
pixel, block of pixels or region of pixels as a single irnnimal bit code representing a specific 
combination(s) of traditionally recognized color values such as U, V or H, S, v/her^ the potential codes are 
pre-established in a color table transmittable to the device prior to the capturing of images, and where the 
potential codes only represent those distinct U, V or H, S colors pre-known to exist, or predommantly 
exist, within the anticipated foreground objects. 
Claim 30: The device of claim 23, fiirther comprising: 

a fifth algorithm operated by coiiQ)uting elements within the device, and capable of receiving the ou^ut of 
the fiiiid algorithm, for detecting within all extracted foreground block pixels the presence of object 
sh^>es, where any and all recognizable sh^)es are translated into symbolic representations potentially 
including information describing approximate shape, such as fitted circles, ellipses, curves or rectangles, 
the relative locations of the shapes wifiiin the blocks, the shape centroid and potentially its orientation, and 
^ere all symboUc representation information is then output in association with each extracted block. 
Claim 3 1 : The device of claim 23, without tiie c^abihty of iiq)utting video streams &om external 
connected cameras, further conq>rising: 

one or more fixed video image sensors within the device, each cq>turing a single video stream of full 
fiames for processing in r^lacement of the externally iiq>ut video streams. 

Claim 32: A method for automatically creating a content database describing an event, where the event 
conqnrises activities conducted over time by at least one participant potentially using one or mor^ articles 
and conducted in a preset area, wherein all participants, portions of partic^ants or articles are collectively 
the foreground objects of the event, con9msing the steps of: 

simultaneously and periodically c^turing video images of some or all of the preset area using fixed video 
cameras each with a defined X, Y, Z location with respect to the preset area; 

extracting ftom the video images only those blocks of pixels pertaining to the foregroimd objects in order 
to create an extracted foreground blocks database at least including all foreground pixels and some 



wo 2005/099423 



CA 02563478 2006-10-16 



PCTAJS2005/013132 



. 120 - 

representation of each block's associated image row and colunan coordinates and the c^turing video 
camera's X, Y, Z field-of-view location, and 

detecting within the extracted foreground block pixels the presence of pre-known object shapes and 
identities in order to create an object tracking database at least including mathematical representations of 
all detected shapes and their ongoing X, Y locations with respect to the preset area. 
Claim 33: The method of claim 32 \s^erein if the preset area is substantially illuminated by one or more 
alternating current driven light sources, conqjrising the additional steps of: 

converting the sinusoidal waveform of the alternating current powering the light sources into a trigger 
waveform synchronized to some full or fiactional beat of the lighting discharge cycle, and 
optionally using the trigger waveform to automatically control the shutter of the fixed video camera(s) 
thereby assuring consistent lighting conditions between successive video images. 
Claim 34: The method of claim 32 conprising tiie additional step of: 

using the information within tiie extracted foreground blocks database to create a single conqwsite view of 
some or all of the preset area per each simultaneously c^tured set of video images, where the video stream 
output forms a con:q)osite video image database. 
Claim 35: The method of claim 32 coaq)rising the additional step of: 

using the infonnation within the extracted foreground blocks database to create a single conqwsite view of 
some or all of the preset area per each simultaneously captured set of video images, where the video stream, 
output is additionally transformed into line art images using a gradiait function forming a con^site hne 
art image database. 

Claim 36: The method of claim 32 con^)rising flie additional step of: 

using the object identity and location information within the object tracking database to automatically 
direct the parming, tilting and / or zooming movements of one or more fixed perspective videoing cameras, 
where the video stream outputs &om flie one or more canoeras form a perspective video database 
synchronized with the object tracking database. 
Claim 37: The method of claim 36 cortq)rising the additional step of; 

storing in the perspective video database the directed pan and tilt angle and / or zoom depth of each 
c^tured image &om each perspective video camera along with the camera's fixed X, Y, Z field-of-view 
location. 

Claim 38: The metiiod of claim 36 con^>rising the additional steps of: 

placing the automatic paiming, tilting and / or zooming perspective video cameras onto tracks allowing 
their pivotal centers to be controUably moved to various locations, and 

using the object identity and location information within the object tracking database to additionally 
automatically direct the track movements of the one or more perspective videoing cameras. 
Claim 39: The method of claim 38 comprising the additional step of: 

storing in the perspective video database the directed track location of each captured image &om each 
location movable perspective video camera representing each camera's moving X, Y, Z field-of-view 
locatiotL 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-121- 

Claim 40: Hie method of claim 42 comprising the additional step of: 

c^turing video and audio infomiation of commentators describing the event fonning a commentator video 
/ audio database synchronized with the object tracking database. 
Claim 41 : The method of claim 32 comprising the additional steps of: 

using continuous recordings of the ambient sounds of the event to create volume and tonal mappings as 
tokens for stomge in a volume and tonal maps database, and 

subsequently decoding these volume and tonal mappings to recreate a similar ambient soimd as the original 
event 

Claim 42: The method of claim 32 where the partic^ant's outer clothing bears unique indicia, conqnising 
the additional steps of: 

using object shape identity and location infomoation within tiie object tracking database to automatically 
direct the panning, tilting and / or zooming movements of one or more perspective participant id cameras 
in order to capture at least one image of the clothing's unique indicia for each particq)ant; 
extracting firom the at least one image of each clothing's unique indicia that block of pixels pertaining to 
the unique indicia in order to create extracted foreground blocks for compmson to a pre-^ied database of 
potential indicia, and 

detecting within tiie extracted foreground block pixels the presence of identification markings in order to 
uniquely identify each participant within tiie object tracking database. 
Claim 43: The method of claim 32 conq)rising the additional step of: 

detecting v^dthin tiie extracted foreground block pixels the presence of identification maridngs in order to 
uniquely identify each participant within the object trackmg database. 
Claim 44: The method of claim 43 conopising the additional step of: 

using the object shspG and partic^ant identity and location information within the object tracking database 
to create a single con:q>osite view of some or all of the preset area per each simultaneously caqrtured set of 
tracking data, >^ere the identified object shape stream ou^ut forms a cowpositG symbolic image database. 
Claim 45: The method of claim 44 comptising the additional step of: 

using the information within the extracted foreground blocks database to create a single conq>osite view of 
some or all of tiie preset area per each simultaneously captured set of video images, where the video stream 
output forms a composite video image database. 
Claim 46: The method of claim 45 con^rising the additional step of: 

usmg the identified object sh^ stream to create gr^hic overlays placed onto tfie single conqx>site 
forming a conq)osite video image with graphic overlays database. 
Claim 47: The mefliod of claim 44 con^rising &e additional step of; 

using the information within the extracted foreground blocks database to create a single conqx>site view of 
some or all of the preset area per each simultaneously captured set of video images, where the video stream 
output is additionally transformed into line art images using a gradient function forming a conqx)site line 
art image database. 

Claim 48: The method of claim 47 comprising the additioiial step of: 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



- 122 - 

using the identified object shape stream to create gr^hic overlays placed onto the single conq>osite 
forming a coniposite line art image with graphic overlays database. 
Qaim 49: The method of claim 43 conqitising the additional step of: 

placing identification markings onto the participants to be detected within the extracted foreground block 
pixels. 

Claim 50: The metiiod of claim 43 conqnising the additional step of: 

using object shape and participant identity and location infomiation within the object tracking database to 
form a performance measurement database and summarized into a performance descriptors database 
including discrete foreground object actions occurring within the event, where each action has a starting 
and ending time synchronized with the object tracking database. 
Claim 51: The method of claim 50 con:q)rising the additional step of: 

using object shape and participant identity and location infomiation within the object tracking database to 
create specific measurements, statistics and analysis that are added to the performance measurement 
database and summarized into a performance descr^tors database. 
Claim 52: The method of claim 51 conq>rising the additional stq>s of: 

using information within the performance measurement and descriptors databases to create sequences of 
tokens encoding descriptions of the event activities, performance assessments, and summary statistics, 
forming a commentator descriptors database synchronized with the object tracking database, and 
subsequently decoding and synthesizing the sequence of descriptor tokens into audible speech and event 
sound representations for describing the event 
Claim 53: Hie method of claim 43 comprising the additional step of: 

using the object and participant identity and location information within the object tracking database to 
automaticaily direct the paiming, tilting and / or zooming movements of one or more fixed perspective 
videoing cameras, where the video stream ou^uts fiom the one or more automatically directed cameras 
form an automatic perspective video database synchronized with the object tracking database. 
Claim 54: The mediod of claim 53 comprising the additional step of: 

using the object sh^ and participant identity and location information within the object tracking based 
along with the pan and tilt angle and / or zoom depth of each automatically directed perspective video 
camera, as well as the camera's fixed X, Y, Z field-of-view location, in order to create a separate camera- 
fii'ame database listing each identified participant and / or article visible within each successive captured 
image, for each automatically dhected prespective camera in use, along with the participant's and / or 
article's relative intra-fi:ame coordinates at least including the sh^ centroid, as well as the distance &om 
the centroid to the camera, providing a means for selecting all images sequences &om any video cameras 
containing selected participants and / or articles. 
Claim 55: The method of claim 53 conqirising the additional step of: 

storing in the perspective video database the directed pan and tQt angle and / or zoom depth of each 
captured image &om each fixed perspective video camera along with flie camera' s fixed X, Y, Z field-of- 
view location. 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/G13132 



-123- 

Claim 56: The method of claim 53 coiiq)ristQg the additional steps of: 

placing the automatic pamiing, tilting and / or zooming fixed perspective video cameras onto tracks 
allowing their pivotal centers to be controllably moved to various locations, and 
using the object identity and location information within the object tracking database to additionally 
automaticaUy direct flie track movements of the one or more perspective videoing cameras. 
Claim 57: The method of claim 56 conq)rising the additional step of: 

storing in the perspective video database the directed track location of each c^tured image &om each 
location movable perspective video camera representing each camera's moving X, Y, Z field-of-view 
location. 

Claim 58: The method of claim 43 conqnrising the additional step of: 

capturing the video ou^ut of one or more manually directed perspective videoing cameras, n^ere the 
video stream outputs fix)m the one or more manually directed cameras form a manual perspective video 
database synchronized with the object tracking database. 
Claim 59: The method of claim 58 comprising the additional steps of: 

automatically detecting the location and orientation of each one or more manually directed cameras as they 
capture each image in their video streams forming a manual camera location and orientation database 
synchronized with the manual perspective video database, and 

using tiie object shape and participant identity and location within the object tracking based along with the 
detected location and orientation of each manually directed camera in order to create a separate camera- 
firame database listing each identified participant and / or article visible within each successive manually 
captured image, for each manually directed camera in use, along with the participant's and / or article's 
relative intra-firame coordinates at least iacluding tiie shape centroid, as well as the distance fix>m the 
centroid to the camera, providing a means for selecting all images sequences finm any video cameras 
containing selected participants and / or articles. 
Claim 60: The method of claim 43 conqmsing the additional steps of: 

tracking the identities and locations of event spectators of some relationship to event participants, 
concurrent with the movements of the related participants as represented in the object tracking database, 
forming a spectator tracking database synchronized with &e object tracking database, and 
using the spectator identity and location information within the spectator tracking database to 
automatically direct the panning, tilting and / or zooming movements of one or more fixed spectator 
videoing cameras, vdiere the video stream ou^uts &om the one or more automatically directed cameras 
form an automatic spectator video database synchronized with the spectator tracking database. 
Claim 61 : The method of claim 60 con5)rising the additional step of: 

storing in the spectator video database the directed pan and tilt angle and / or zoom dqjth of each captured 
image from each spectator video camera along with tiie camera's fixed X, Y, Z field-of-view location. 
Claim 62: The method of claim 60 conqirising the additional steps of: 

placing tiie automatic panning, tilting and / or zooming fixed spectator video cameras onto tracks allowing 
tiieir pivotal centers to be controllably moved to various locations, and 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



.124- 

using the spectator identity and location infonnation within the spectator tracking database to additionally 
automatically direct the track movements of the one or more spectator videoing cameras. 
Claim 63: The method of claim 62 comprising the additional step of: 

storing in the spectator video database the directed track location of each c^tured image fiom each 
location movable spectator video camera representing each camera's moving X, Y, Z field-of-view 
location. 

Claim 64: The method of claim 60 conq>rising the additional step of: 

recording specific audio streams fiom tracked spectators based upon specific movements of related 
participants and adding this information to the spectator video database forming a spectator video / audio 
database. 

Claim 65 : A method for automatically creating conq)ressed video of an event, where the event comprises 
activities conducted over time by at least one participant potentially using one or more articles and 
conducted in a preset area, wherein all participants, portions of participants or articles are collectively the 
foreground objects of tiie event, coni^rising the steps of: 

using fixed video caiiieras, each with a defined X, Y, Z location witii respect to the preset area, to ^ 
at least one background image of some or all of the preset area per camera prior to the presence of 
foreground objects; 

using tiie same fixed video cameras, to simultaneously and p^odically capture ongoing current images for : 
some or all of the duration of the event; 

using coiiq>arisons, in the either original or transformed formats, of the current video images versus the 
stored backgrounds in order to extract fixmi the current video images only those blocks of pixels 
enconq>assing one or more pixel contiguous foreground objects, where within each block all background 
pixels are indicated or oflierwise set to some null value, and ^lere each block's row and column 
coordinates within each current image are associated with the blodk, and 

storing only the extracted foreground blocks, or some derivative thereof such as a video or gradient (line 
art) format, into a concurrentiy unsorted and optionally conq)res$ed fixed video database absent of 
background pixels, along with some representation of each block's associated current image row and 
column coordinates and the capturing video camera's X, Y, Z field-^f-view locatioa 
Claim 66: The metiiod of claim 65 con^rising the additional step of: 

using the background pixels outside die extracted blocks in order to update the stored background images 

between successive current image captures. 

Claim 67: The metiiod of claim 66 comprising the additional step o£ 

using the indicated background pixels within eadi extracted block in order to iqxiate the stored badcground 

images between successive current image c^tures. 

Claim 68: The method of claim 65 con^)rising the additional steps of: 

establishing an encoded color table for each potential foreground object, or object type, prior to videoing 
the event conq>rising all of the colors, or all of the predominant colors, expectsd to be present in that 
object, or object type, vAiert each color is represented by a single unique binary code of fewer bits than 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



- 125 - 

required to typically represent that same color in a traditioiial two-code color schemes such as U, V (for the 
Y, U, V mefliod) or H, S (for the H, S, I metho<0, and 

optionally using the bit encoding of the foregroimd color table to represent each color found on a 
foreground object rather than the traditional U, V (for the Y, U, V methocQ, H, S (for the H, S, I method) or 
similar method that encodes a larger spectrum of potential colors. 
Claim 69: The method of claim 68 conq^iising the additional steps of: 

separating each original full-block into underlay representations only containing color information and 
overlay representations only containing grayscale (luminosity) information; 

creating enclosed color regions for all contiguous pixels of the same color, or range of colors, within the 
underlay representation; 

creating enclosed grayscale regions for all contiguous pixels of the same grayscale, or range of grayscales, 
within the overlay representation; 

dynamically estimating a conq)ression ratio for each block or tsmponH series of blocks, where each block 
is treated as separate underlay and overlay representations that are encoded as outlines of each region plus 
the interior respective color or grayscale; 

dynamically estimating a conq>ression ratio for each same block or same tenq>oral series of blocks, where 
each b lock is treated without separation into underlay and overlay representations and is encoded as 
traditional macro-blocks or in some other method, and 

cono^aring the potential con^>ression benefits of the separated underlay-overlay representations to the 
original non-separated representation and optionally in^lementing the highest conq>ression for that 
particular block or tenqx>ral series of blocks. 
Claim 70: The method of claim 69 comprising the additional steps of: 

using the color table to identify skin regions within die color underlay representation that are then m^ped 
onto the grayscale overlay representation, and 

selectively encoding the grayscale of tiie m^ped skin color regions within the overlay representation wilh 
a greater range of values than regions of non-skin colors. 

Claim71: The method of claim 65 wherein if the preset area is substantially illiiminated by one or more 
alternating current driven light sources, conq)rising the additional steps of: 

converting the sinusoidal waveform of the alternating current powering the lig^ sources into a trigger 
waveform synchronized to some iuU or fiactional beat of the lighting discharge cycle, and 
optionally using the trigger waveform to automatically control the shutter of the fixed video camera(s) 
thereby assuring consistent ligjiting conditions between successive video images. 
Claim 72: The method of claim 71 conqmsing the additional step of: 

detecting within the extracted foreground block pixels the presence of pre-known object shapes and 
identities in order to create an object tracking database at least including mathematical representations of 
all detected shapes and their ongoing X, Y locations wifli respect to the preset area. 
Claim 73: The method of claim 72 comprising the additional steps of: 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-126- 

arranging a multiplicity of fixed video cameras into a first grid such that their combined fields-of-view 

fiwm a single contiguous and substantially overhead view of the preset area, and 

arranging all adjacent fields-of-view within the first grid such that they overlap at a height in excess of at 

least the tallest targeted fiee-standing foreground object, such as a participant, in all adjacent camera 

views. 

Claim 74: The method of claim 73 comprising the additional step of: 

arranging one or more fixed video camera(s) such that their field(s)-of-view are at a perspective orientation 
to the event, as opposed to the substantially overhead view of the first grid. 

Claim 75: The method of claim 74 where the one or more perspective fixed video camera(s) are capable of 
electronically controllable panning, tilting and / or zooming, con^rising the additional step of: 
using the foreground object tracking information within the object tracking database to automatically direct 
tiie panning, tilting and / or zooming movements of each perspective fixed video camera, where the video 
stream outputs from each perspective camera form a perspective video database synchronized with the 
object tracking database. 

Claim 76: The rnethod of claim 75 conaprising tiie additioiial steps of: 

automatically directing the current pan angle, tUt angle, and / or zoom depth of each perspective fixed 
video camera, to the precision and accuracy of some repeatable increment; 
controllably limiting the capture of images by each perspective fixed video camera to occur only at 
repeatable pan, tilt and / or zoom increments, and 

including within the perspective video database some representation of the combination pan angle, tilt 
angle and / or zoom deptii at which each image was captured. 
Qaim 77: The metiiod of claim 76 corrqxrising the additional steps of: 

automatically directing each perspective camera throu^ some or all of its total limited repeatable 
increment settings of pan ang^es^ tUt angles, and / or zoom depths such that the camera^s field-of-view is 
swept across tiie some or all of the enconqpassed preset area prior to the area*s occupation by foreground 
objects, and 

controllably capturing one perspective background video image per unique conibination of some or all of 
tiie automatically directed settings of pan angles, tilt angles, and / or zoom depths, where die bacl^und 
video image outputs form a background image database per each camera including some representation of 
the combination pan angle, tilt angle and / or zoom depth at which each background image was c^tured as 
well as the camera's fixed X, Y, Z field-of-view location witii respect to tfie preset area. 
Claim 78: The method of claim 77 conq^nising the additional steps of: 

controllably assuring that each current image c^tured during the event by each perspective camera is at a 
combination of pan angle, tilt angle and zoom depths that either directiy corresponds to an associated 
background image or is interporable between neighboring associated backgroimd images, 
using compaxisons in their original or modified formats of each current video image and either the directiy 
associated, or an interpolation of the neighboring associated, background iniage(s) in order to extract &om 
the current video miage only tiiose blocks of pbtels pertaining to die foreground objects, where within each 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



- 127 - 

block all background pixels are optionally indicated or otherwise set to some null value, and where each 
block's row and column coordinates within each current image are associated with that block, and 
storing only the extracted foreground blocks, or some derivative thereof, into an optionally compressed 
perspective video database along with some representation of each block's associated current image row 
and column coordinates and fee c^turing perspective video camera's X, Y, Z field-of-view location and 
current pan angle, tilt angle and zoom depth. 
Claim 79: The method of claim 78 conqsrising the additional step of: 

using the background pixels outside the extracted blocks in order to update the stored background images 

between successive current image captures. 

Claim 80: The method of claim 78 conq)rising the additional step o£ 

using the indicated background pixels within each extracted block in order to update the stored background 

images between successive current image captures. 

Claim 81 : The method of claim 78 conqirising the additional steps of: 

spatially sorting all concurrent extracted foreground blocks for each single image capture period within 
each single contiguous field-of-view, where the contiguous field-of-view is either created by a multiplicity 
of fixed cameras such as the first grid or by each separate pan, tilt and zoom c^>able camera, such as the 
motion controllable perspective video cameras, into distinct categories based upon foreground objects, at 
least including: 

- Blocks that contain a single participant with or without carried or worn articles; 
Blocks that contain a single firee moving article; 

Blocks that contain a single participant with or without carried and wom articles and with an 
overlq>ping fiee moving article; 

Blocks that contain overls^ing participants regardless of carried, wom or fiee moving articles; 
tenqxnaUy groiq>ing each spatially sorted concurrent block with any and all of the periodically consecutive 
similar blocks determined to contain the same: 

single participant with or without carried or wom articles; 

- single fi:ee moving article; 

- single particq}ant with or without carried and wom articles and with an overlapping fi:ee moving 
article; 

- overliving participants regardless of carried, wom or fiee moving articles, and 

storing and optionally conqiressing as individual streams each spatially sorted and tenq>orally grouped 
sequence of similar blocks, or some derivative thereof, along with some representation of each block's 
associated current inmge row and column coordinates and the capturing video camera's X, Y, Z field-of- 
view location and, if apphcable such as with the motion controllable perspective video cameras, the current 
pan angle, tUt angle and zoom depth, i^ere the total collection of individually stored streams forms a 
concurrentiy sorted fixed video database per each single contiguous field-of-view. 
Claim 82: The metiiod of claim 81 coiiq>rising tiie additional steps of: 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



. 128 - 

detennining a shape centroid for each detected pie-known participant and article within each extracted 
block; 

determining a block centroid for each extracted block equal to: 

the single sh£^e centroid if the block contains only a single participant or article; 

the participant shape centroid if the block contains only one participant and one or more articles; 

the spartially averaged centroid derived from each participant's shape centroid if the block 

contains more than one participant, regardless of other articles; 
centrally aligning all similar blocks in a specific stream according to their block centroids; 
expanding each block as necessary to fill out a dynamically or pre-<letennined carrier frame size, where all 
new pixels added during tiie e;q)ansion are set to null backgroimd pixels, and 
optionally performing spatial and ten^ral conqjression on each centrally aligned stream. 
Claim 83: The method of claim 81 con^rising the additional step of: 

detecting within tiie extracted foreground block pixels the presence of identification markings in order to 
uniquely identify each participant within the object tracking database. 
Claim 84: The method of claim 83 conq)rising the additional steps of: 

placing identification markings onto the participants to be detected witiiin the extracted foreground block 
pixels. 

Claim 85: The method of claim 83 \^ere the participant's outer clothing bears imique indicia, conq)rising 
the additional steps of: 

using information within the object tracking database to automatically direct the panning, tilting and / or 
zooming movements of one or more perspective particq)ant id cameras in order to capture at least one 
image of the clothing's unique indicia for each participant; 

extracting from the at least one image of each clothing's unique indicia that block of pixels pertaining to 
the unique indicia in order to create tiie extracted foregroimd blocks for conq>arison to the pre-stored 
database of potential indicia. 

Claim 86: The method of claim 83 conq)rising the additional steps of: 

creating a separate camera-frame database listing each identified participant and / or article visible within 
each successive captured image, for each camera in use, along with the participant's and / or article's 
relative intra-fi:ame coordinates at least including the shape centroid, as well as tiie distance fipom the 
centroid to the camera, providing a means for selecting all image sequences containing selected 
participants and / or articles. 

Claim 87: The method of claim 86 conq>rising the additional steps of: 

using the tracking and identity information within the object tracking database to record the identity of 
each and every participant ti>at may be in any given extracted block within any given stream, and 
splitting individual original frill-blocks containing two or more identified participants into one split-block 
per main participant, where each split-block contains at least some portion of the respective main 
participant and potentially portions of another overlapping participants, such tiiat each new spHt-block has 
essentially been reclassified as coiiq>ared to its originating friU-block, and 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



- 129 - 

resorting and regrouping all reclassified new split-blocks into their own new streams or into oflier existing 

streams already created for the given main participants. 

Claim 88: Hie method of claim 87 coiiq)rising the additional steps of: 

optionally performing the equivalent of a digital zooming, either in or out, on any and each successive 
block in a specific stream determined by some combination of the camera's zoom depth setting for the 
given block as well as the distance between the block's centroid and the camera, so as to normalize all 
blocks within tiie stream to a rougjily equivalent distance fix)m the camera and therefore roughly equivalent 
intra-fiame object size, and 

associating a fector r^resenting the determined digital zoom transformation with each block providing a 
means for reversing the transformation during decoding. 

Claim 89: The method of claim 87, ^en con^iressing the streams using a standard group of pictures 
technique, the additional st^s of: 

determining the current pose of flie partic^)ant and / or articles wiflim the "F (independent) first fiame of 
the groi^ of pictures; 

encoding the T fiame with respect to a standard pose, referred to by a pose number and pre-existing in a 
library of standard po ses, where the standard pose best matches the current pose, and 
inchiding the pose number with the "F' frame information for use by the decoder that has available the 
same standard library of poses. 

Claim 90: The metiiod of claim 89, when con:^)ressing the streams using a standard group of pictures 
technique, the additional steps of: 

detecting the pose of each object, in each fiame, in each groi^ of pictures, for all groups in all streams; 
conq>aring the detected pose to a historical Ust of known poses for tiie given event and / or other related 
events, where the same historical hst is maintained within both tiie encoder and decoder; 
adding any pose of a sufficiently different arrangen^t to both the encoder and decoder's historical list, to 
be referenced by the same distinct pose number in both lists, and 

allowing all ten^orally subsequent current poses of T' frames to be compared and referenced to the newly 
added and all existing poses in the historical Ust, as well as all poses in any standaidlist 
Claim 91 : The method of claim 73 comprising the additional steps of: 

spatially sorting all concurrent extracted foreground blocks fix)m a smgle image capture period across tiie 
entire first grid, mto distinct categories based upon foreground objects, at least including: 

- Blocks that contain a single participant witii or without carried or worn articles; 
Blocks tiiat contain a single firee moving article; 

- Blocks that contain a single particq)ant with or without carried and worn articles and wifli an 
overlapping free moving article; 

- Blocks that contain overlying particqjants regardless of carried, worn or free moving articles; 
tenq}orally grouping each spatially sorted concurrent block with any and all of flie periodically consecutive 
similar blocks determined to contain the same: 

single participant with or without carried or worn articles; 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-130- 

single free moving article; 
- single participant wifti or without caitied and worn articles and with an overlapping free moving 
article; 

overlapping participants regardless of carried, worn or free moving articles; 
either before or after tenqwral sorting, where it is determined that the same foregromid object(s) was / were 
image captured and extracted by two or more adjacent cameras, and where at least one adjacent camera 
had a complete view of the foreground object(s) such that none of the foreground pixels touched the 
outermost edge of the that camera's field-of-view, keeping the best conqplete view extracted block and 
marking as redundant all other extracted blocks of the same foreground object(s), 

either before or after tenqwral sorting, where it is detemiined that the same foregroimd object(s) was / were 
image cq)tured and extracted by two or more adjacent cameras, and where no ac^acent camera had a 
complete view of the foreground object such that at least one of the foreground pixels touched the 
outermost edge of each capturing camera's field-of-view, subsequently joining into a single new block 
those two or more available partial-view extracted blocks determined to best represent a single con^lete 
view and marking as redundant all other extracted blocks of the same foreground object(s), and 
storing and optionally conqpressing each stream of spatially sorted and tenqK)ra]ly grouped sequence of 
similar blocks, or some derivative thereof regardless of the individual camera within the first grid tiiat 
captured tiie block, along with some representation of each block's associated current X, Y coordinates 
within the preset area, where the total collection of individually conqsressed streams forms a concurrently 
sorted oveihead fixed video database. 

Claim 92: Tlie method of claim 91 conqirising the additional steps of: 

arraaging a multiplicity of fixed video cameras into a second grid such that their combined fields-of-view 
form a single contiguous and substantially oveihead view of the preset area, overlapping with and offeet 
from the first grid, and 

arra ng i n g all adjacent fields-of-view within flie second grid such that they overly at a heigjit in excess of 
at least the tallest targeted free-standing foregroimd object, such as a particq>ant, in both adjacent camera 
views. 

Claim 93: Tlie method of claim 92 conqnrising the additional step of: 

classifying, concurrently sorting and temporally groining the extracted blocks horn the second grid 
intemiixed with the streams formed from the first grid; 

comparing all concurrent intermixed first and second grid blocks within a single stream in order to identify 
which o verl^ping first or second grid camera has captured a block that is both fiiUy witiiin its field-of- 
view and closer to its cenler-of-view tiian any of the oflier overlqjping cameras, marlring as redundant aU 
other concurrent blocks; 

conq^aring all conctirrent intermixed first and second grid blocks within a single stream in order to identify 
which overlapping first or second grid cameras have captured the best two or more blocks for joining into a 
single block, where none of the overlapping cameras caq3tured a single block in fiiU view, maridng as 
redundant all other concurrent blocks, and 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



-131- 

stoiing and optionally con^)ressing each stream of spatially sorted and ten^wrally grouped sequence of 
similar blocks, or some derivative thereof regardless of the individual camera within tiie first or second 
grid that captured (he block, along with some representation of each block's associated current X, Y 
coordinates within the preset area, where the total collection of individually compressed streams forms a 
concurrently sorted overhead fixed video database. 
Claim 94: The method of claim 93 conqnrising the additional step of: 

using the overlapping extracted foreground blocks firom the first and second grids, especially including 
redundant blocks containing additional views of the same object, to generate Z information via 
stereoscopic triangulation, for as many detected objects as are visible fiom ttvo or more cameras, where the 
Z information is appended to the existing object tracking database that already includes mathematical 
representations of aU detected shapes and their ongoing X, Y locations with respect to the preset area. 
Claim 95: The method of claim 94 comprising the additional steps of: 

arranging a multq)licity of fixed video cameras into a third or more grids such that each additional grid's 
combined fields-of-view form a single contiguous and substantially overhead view of Hie preset area, 
ov^lapping with and of&et fix>m &e first and second grids, and 

arranging all adjacent fields-of-view within the third or more grids such that they overlap at a height in 
excess of at least the tallest targeted fiee-standing foreground object, such as a participant, in both adjacent 
camera views. 

Claim 96: The me&od of claim 95 con^)rising the additional step of: 

classifying, concurrenQy sorting and teirgporally groiqping the extracted blocks &om &e third or more grids 
intermixed widi the streams formed &om the first and second grid; 

conq>aring all concurrent intemuxed first, second, third or more grid blocks within a single stream in order 
to identify which overlapping first, second, tihird or more grid camera has captured a block that is both 
fiilly within its field-of-view and closer to its center-of-view than any of die other overlq)ping cameras, 
marking as redundant all other concurrent blocks; 

comparing all concurrent intermixed first, second, third or more grid blocks within a single stream in order 
to identify which overlapping first, second, third or more grid cameras have c^tured the best two or more 
blocks for joining into a single block, where none of the overl^iping cameras c^tured a single block in 
fiill view, marking as redundant all o&er concurrent blocks, and 

storing and optionally con:q)ressing each stream of spatially sorted and temporally groiq}ed sequence of 
similar blocks, or some derivative thereof, regardless of the individual camera within tiie first, second, 
third or more grids that c^tured die block, along with some representation of each block's associated 
current X, Y coordinates within the preset area, where the total collection of individually conqiressed 
streams forms a concurrently sorted overhead fixed video database. 
Claim 97: The method of claim 96 conQ)rising the additional step of: 

using the overlapping extracted foreground blocks fi:om the first, second, third or more grids, especially 
including redundant blocks containing additional views of the same object, to generate Z information via 
stereoscopic triangulation, for as many detected objects as are visible fipom two or more cameras, where die 



wo 2005/099423 



CA 02563478 2006-10-16 



PCT/US2005/013132 



- 132 - 

Z mfoimation is ^jpended to the existing object traddng database tiiat already includes mathematical 
representations of all detected shapes and their ongoing X, Y locations with respect to the preset area. 
Claim 98: The method of claim 77 conq)rising the additional steps of: 

creating a single panoramic background image for each controllably movable perspective camera, where 
the panoramic is a conq)osite of multiple background images c^tured at successive pan and tilt angles as 
found in the background image database, preferably based upon background images captured at a zoom 
depth equal to or greater than that e;q>ected to be used during event videoing, where the panoramic 
background represents the total fixed backgroimd anticipated to be within the ongoing field-of-view of the 
perspective camera as it is controllably moved throughout its range of pan and tilt angles, and 
using the single panoramic background image to recreate individual ongoing background images, or 
equivalent information, corresponding to the ongoing current image's pan / tilt and zoom settings, to be 
used for conq)arison with the current image in siq)port of ihe extraction of all foregroimd blocks. 



wo 2005/099423 



CA 02563478 2006-10-16 
1/56 



PCT/US2005/013132 



500 

At 



Fig.l 



Player & Referee 
Cdenflfication System 
(using Jersey numbers) 



Game CXock and Offtolaf Scoring 
Interface System 



600 



800 



Perfbrmance 
Measurement 
&Analysls DB 



Performance 
Descriptors 



Interface to 
Performance Commentators 



Commentator 
V/A Database 




Commentator 
Descriptors 
Database 



900 



3-D Venue 

Model 
Database 



34) Ad 
Model 
Database 



Tyackfng System 



101 

V 




Tracking 
Database 

Overtiead 

Image 
Database 



200 



J- 



Automatic 
Game Filming System 



201 



202 



102 



Center of 
View 
s.Database^ 




Automatic 
Game Film 
Database^ 



Interface to Manual 
Game Filming 




300 



Camera Loo & 

Orientation 
k^^Dateb^e^ 



Manual 
Game Film 
Database 



T 



400 



AUtamaUc Content 
Assembly & Compression 
System 





Encoded 

hAA AM 

Broadcast 



Broadcast 
Decoder 



T 

-I 







Shots: Kt to 13 
Scone ItDl 
Conlnifc 75-2S^. 





wo 2005/099423 



CA 02563478 2006-10-16 
3/56 



PCT/US2005/013132 




£ 



wo 2005/099423 



CA 02563478 2006-10-16 
4/56 



PCT/US2005/013132 




wo 2005/099423 



CA 02563478 2006-10-16 
5/56 



PCT/US2005/013132 




CA 02563478 2006-10-16 




wo 2005/099423 



CA 02563478 2006-10-16 
8/56 



PCT/US2005/013132 




CA 02563478 2006-10-16 




wo 2005/099423 



CA 02563478 2006-10-16 
10/56 



PCT/US2005/013132 



10cm-CF 
^ (r2.c2) 




(r2, C2) 



'10cm-a2 



-10cm-a1 




->1 



Fig. 6d 



(n.cir 



JJ-yi_^10cm.a1 
zoom = 60% 



(r1,c1):(r2, c2) I 
Zoom: 55% I 

(r1.c1):{r2.c2) 
Zoom: 40% 

(r1.c1):(i2.(5) 
Zoom: 25% I 

(r1,c1):(r2.(^) I 
Zoom: 10% 

(r1,c1):{r2.c2)l" 
Zoom: 10% 

{r1.c1):(r2.c2)l" " 



10cm-Ax 



(r1.c1):(r2. c2) 
Zoom: 59% 




Zoom: 25% I 

(r1,c1);(i2.C2) I 
Zoom: 40% 



Fig, 6e 



wo 2005/099423 



CA 02563478 2006-10-16 
11/56 



PCT/US2005/013132 




wo 2005/099423 



CA 02563478 2006-10-16 
12/56 



PCT/US2005/013132 




wo 2005/099423 



CA 02563478 2006-10-16 
13/56 



PCT/US2005/013132 




wo 2005/099423 



CA 02563478 2006-10-16 
14/56 



PCT/US2005/013132 




wo 2005/099423 



CA 02563478 2006-10-16 
15/56 



PCT/US2005/013132 




wo 2005/099423 



CA 02563478 2006-10-16 
16/56 



PCT/US2005/013132 




wo 2005/099423 



CA 02563478 2006-10-16 
17/56 



PCT/US2005/013132 



Fig. 8 




wo 2005/099423 



CA 02563478 2006-10-16 
18/56 



PCT/US2005/013132 




wo 2005/099423 



CA 02563478 2006-10-16 
19/56 



PCT/US2005/013132 




0 



CA 02563478 2006-10-16 

WO 2005/099423 PCT/US2005/013132 

20/56 




CA 02563478 2006-10-16 

WO 2005/099423 PCT/US2005/013132 

21/56 





wo 2005/099423 



CA 02563478 2006-10-16 
22/56 



PCT/US2005/013132 




wo 2005/099423 



CA 02563478 2006-10-16 
23/56 



PCT/US2005/013132 




wo 2005/099423 



CA 02563478 2006-10-16 
24/56 



PCT/US2005/013132 



Fig. 10a 




20V-7 ' 



Maxrmum Camera Reld-of-View Separation 



22b 



20V-5 



20V-O1 



U^^^20v^2 
20V-3 




^ 1 

20V-7 



Fig. 10b 



Reduced Camerajpield-of-View Separation 
20V-8 



wo 2005/099423 



CA 02563478 2006-10-16 
25/56 



PCT/US2005/013132 




wo 2005/099423 



CA 02563478 2006-10-16 
26/56 



PCT/US2005/013132 




CA 02563478 2006-10-16 

WO 2005/099423 PCTAJS2005/013132 

27/56 




wo 2005/099423 



CA 02563478 2006-10-16 
28/56 



PCT/US2005/013132 




wo 2005/099423 



CA 02563478 2006-10-16 

29/56 



PCT/US2005/013132 




wo 2005/099423 



CA 02563478 2006-10-16 
30/56 



PCT/US2005/013132 




wo 2005/099423 



CA 02563478 2006-10-16 
31/56 



PCT/US2005/013132 




Fig. lid Fig. lie 



wo 2005/099423 



CA 02563478 2006-10-16 
32/56 



PCT/US2005/013132 




10-IeA 10p2-A 



CA 02563478 2006-10-16 

WO 2005/099423 PCT/US2005/013132 

33/56 




CA 02563478 2006-10-16 




wo 2005/099423 



CA 02563478 2006-10-16 
35/56 



PCTAJS2005/013132 




302 301 



wo 2005/099423 



CA 02563478 2006-10-16 
36/56 



PCTAJS2005/013132 




CA 02563470 2006-10-16 

WO 2005/099423 PCTAJS2005/013132 

37/56 



f^antum Efficienqr - l/kmodiftrnte 




CA 02563478 2006-10-16 



WO 2005/099423 



PCT/US2005/013132 



38/56 



2Sb-M 



Fig. 16a 




^ 25p4Vl1 

(Approx: 400 - 900 nrrO 



Fig. 16b 




25p-B 
(Approx: 400 - 500 nm) 



25pX3 

(Approx: 500 - 600 nm) 



25p4^ 

(Approx: 600 - 800 nm) 



25MA\R 



Fig. 16c 




(Approx: 400 - 700 nm) 



^25p-lR 

(Approx: 700 - 800 nm) 



wo 2005/099423 



CA 02563478 2006-10-16 
39/56 



PCT/US2005/013132 



Fig. 16d 



M onochroms Sensor 

Bl Bl Bl IH 




CA 02563478 2006-10-16 



WO 2005/099423 PCT/US2005/013132 

40/56 



Monochnomc Sensor 




(Approve: 400 - 700 nm) 



Monochrome Sensor 




25r-VIS 24L-2 
(Approx: 400 - 700 nm) 



wo 2005/099423 



CA 02563478 2006-10-16 
41/56 



PCT/US2005/013132 




wo 2005/099423 



CA. 02563478 2006-10-16 
42/56 



PCT/US2005/013132 




wo 2005/099423 



CA 02563478 2006-10-16 
43/56 



PCT/US2005/013132 




wo 2005/099423 



CA 02563478 2006-10-16 
44/56 



PCT/US2005/013132 




wo 2005/099423 



CA 02563478 2006-10-16 
45/56 



PCT/US2005/013132 




Tracking System 




wo 2005/099423 



CA 02563478 2006-10-16 
46/56 



PCT/US2005/013132 




wo 2005/099423 



CA 02563478 2006-10-16 
47/56 



PCT/US2005/013132 



10c<lb 



Fig. 23 



10e5^b 



Steam A - Full-FmnKS, No loss \ 




Frame 16 



Frame 2i 




Frame 26 




FmmeSi 



10oF36^ 

Frame 36 



- ;--^v.*if-v*- . . .:- 




steam S = Sii6-F/ames, /Vo loss 
^^^^^^ 



lOoesOe 




Sub-Frame 6.1 



Sub-Fr^e 11.1 



(n,ci) I 



lOc^esir 



Sub-Frwne 16.1 



SuMnrame 2f .f 



110c-es36^ 




Sub-Frame 26.1 

M 



y... nc^ Sub^rame31.1 



^c2) Sub^rame36,1 



10y-db 



Tracked \ 
Motion 
Vectors \ 



1CMnv06 



V 

10-mv11 



10-fnv36 



frt,cf^W!D J 



wo 2005/099423 



CA 02563478 2006-10-16 
48/56 



PCT/US2005/013132 




wo 2005/099423 



CA 02563478 2006-10-16 



49/56 



PCT/US2005/013132 



lOo-esOl 



Streams 

Variai^Bandwim 
No Less 

J * 



Fig. 25 



10es-db 



stream Bi 
Rotated CentofBCi 
Sub-Frames 
Fixed Bandy^iath 
No Loss 



lOcHlbAx 



Sub-Frame 1,1 

(in,c1):(i7,e2) 



\Sub4=rame6.1 
Rot[0on: f 5 




Subframeil.l 
Rtaaeta$:90 



[Sub-Firame 1$,i 
RotaSom-IT? 



\Sub4Hrame21.i 



\Sub'Fmtte2M 



lOe&Klbl 



^\SubFrBme31J 



xSub'Fr&fne 3e.t 
RcaaSott:7S 



wo 2005/099423 



CA 02563478 2006-10-16 
50/56 



PCT/DS2005/013132 



Fig. 26 



Stmam B1 
RotBtBd, CentBtQd 
Sub-Frames 
Rxed Bandwidth 
No Loss 



lOes^l 



Stream B2 
Rotated, Centered 
and Scrubbed 
stdWirawwif Sub^rames 
^'SS^o Fixed Bandwidth 
No Loss 




wo 2005/099423 



CA 02563478 2006-10-16 
51/56 



PCT/US2005/013132 




wo 2005/099423 



CA 02563478 2006-10-16 
52/56 



PCT/US2005/013132 




WP 2005/099423 



CA 02563479 2006-10-16 
53/56 



PCT/US2005/013132 




Fig. 29a 



A 1- 

^ ) Broadtais t Encoder 

^ »— ' — J. M I " 



^ Y y ^ ^ 
^ VMeoStream ^ ^ ^ 

For each Loca1l2ed, Normalized Sut>Strcam 
IjocaUzafion Data: Okqect ID 

Nonmnzatton Data: Extracted Locaflon, Rotation, Zoom 
Pose Into: Piedlcfive Pose Code 



904v 



i i 



I 

m 

o 



CO 

o 

I 



904m 



1 

s 



904a 



wo 2005/099423 



CA 02563478 2006-10-16 
54/56 



PCT/DS2005/013132 




202a 



^202s 



^402b 









Streams of 




Associated 


Current 




P/TK 






sSctHngsy 



^203 









Ambient 




Volume 


Audio 




& Tonal 


^econflnc^ 




^ Maps 




is. 29b 



904V- 



904rTr-^ : 



904a- 



wo 2005/099423 



CA 02563478 2006-10-16 
55/56 



PCT/US2005/013132 




iioissaiduiOD eu|SB»i3U| |o uo||3aJ!a 



wo 2005/099423 



CA 02563478 2006-10-16 
56/56 



PCT/US2005/013132 




